Open HenrikBengtsson opened 6 years ago
I've posted this to R-devel thread 'ROBUSTNESS: a:b to give an error when length(a) > 1 or length(b) > 1 - not just a warning' on 2018-09-01.
Analogously to _R_CHECK_LENGTH_1_CONDITION_
and _R_CHECK_LENGTH_1_LOGIC2_
, this could temporarily be controlled by a new _R_CHECK_LENGTH_1_COLON_
environment variable, e.g. _R_CHECK_LENGTH_1_COLON_=true
.
It turns out there's one valid case for length(a) > 1
and length(b) > 1
, namely when a
and b
are factors and length(a) == length(b)
, e.g.
> x <- as.factor(letters[1:3])
> y <- as.factor(LETTERS[4:6])
> x:y
[1] a:D b:E c:F
Levels: a:D a:E a:F b:D b:E b:F c:D c:E c:F
This is covered in https://github.com/wch/r-source/blob/9a66e7cc091d68a8fefb99e3c3640ab8a6fb599e/src/main/seq.c#L21-L23.
The code that needs to be updated is:
SEXP attribute_hidden do_colon(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP s1, s2;
double n1, n2;
checkArity(op, args);
if (inherits(CAR(args), "factor") && inherits(CADR(args), "factor"))
return(cross_colon(call, CAR(args), CADR(args)));
s1 = CAR(args);
s2 = CADR(args);
n1 = length(s1);
n2 = length(s2);
if (n1 == 0 || n2 == 0)
errorcall(call, _("argument of length 0"));
if (n1 > 1)
warningcall(call,
ngettext("numerical expression has %d element: only the first used",
"numerical expression has %d elements: only the first used",
(int) n1), (int) n1);
if (n2 > 1)
warningcall(call,
ngettext("numerical expression has %d element: only the first used",
"numerical expression has %d elements: only the first used",
(int) n2), (int) n2);
n1 = asReal(s1);
n2 = asReal(s2);
if (ISNAN(n1) || ISNAN(n2))
errorcall(call, _("NA/NaN argument"));
return seq_colon(n1, n2, call);
}
BTW, what happens right now if the length is > 2^32? Should n1 = length(s1);
be n1 = xlength(s1);
?
BTW, what happens right now if the length is > 2^32? Should n1 = length(s1); be n1 = xlength(s1);?
It turns out it's already taken care of indirectly, e.g.
> a <- 1:.Machine[["integer.max"]]
> 1:a
[1] 1
Warning message:
In 1:a : numerical expression has 2147483647 elements: only the first used
> a <- 1:(.Machine[["integer.max"]]+1)
> 1:a
Error: long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537
Below is an R-devel patch that gives:
> a <- 1:2
> 1:a
[1] 1
Warning message:
In 1:a : numerical expression has 1 elements: only the first used
> Sys.setenv("_R_CHECK_LENGTH_COLON_" = "true")
> a <- 1:2
> 1:a
Error in 1:a : numerical expression has length > 1
> a:1
Error in a:1 : numerical expression has length > 1
> Sys.setenv("_R_CHECK_LENGTH_COLON_" = "false")
> a <- 1:2
> 1:a
[1] 1
Warning message:
In 1:a : numerical expression has 1 elements: only the first used
$ svn diff
Index: src/main/seq.c
===================================================================
--- src/main/seq.c (revision 83118)
+++ src/main/seq.c (working copy)
@@ -144,25 +144,25 @@
double n1, n2;
checkArity(op, args);
- if (inherits(CAR(args), "factor") && inherits(CADR(args), "factor"))
- return(cross_colon(call, CAR(args), CADR(args)));
-
+
s1 = CAR(args);
s2 = CADR(args);
+ if (inherits(s1, "factor") && inherits(s2, "factor"))
+ return(cross_colon(call, s1, s2));
+
n1 = length(s1);
n2 = length(s2);
- if (n1 == 0 || n2 == 0)
- errorcall(call, _("argument of length 0"));
- if (n1 > 1)
- warningcall(call,
- ngettext("numerical expression has %d element: only the first used",
- "numerical expression has %d elements: only the first used",
- (int) n1), (int) n1);
- if (n2 > 1)
- warningcall(call,
- ngettext("numerical expression has %d element: only the first used",
- "numerical expression has %d elements: only the first used",
- (int) n2), (int) n2);
+ if (n1 != 1 || n2 != 1) {
+ if (n1 == 0 || n2 == 0)
+ errorcall(call, _("argument of length 0"));
+ char *check = getenv("_R_CHECK_LENGTH_COLON_");
+ if ((check != NULL) ? StringTrue(check) : FALSE) // warn by default
+ errorcall(call, _("numerical expression has length > 1"));
+ else
+ warningcall(call, _("numerical expression has %d elements: only the first used"),
+ (int) (n1 > 1) ? n1 : n2);
+ }
+
n1 = asReal(s1);
n2 = asReal(s2);
if (ISNAN(n1) || ISNAN(n2))
Posted PR#18419 'Robustification of a:b - produce an error if length(a) != 1 or length(b) != 1' with the above patch.
A minor fix to the patch (https://bugs.r-project.org/show_bug.cgi?id=18419) has been updated accordingly:
Index: src/main/seq.c
===================================================================
--- src/main/seq.c (revision 84006)
+++ src/main/seq.c (working copy)
@@ -144,25 +144,25 @@
double n1, n2;
checkArity(op, args);
- if (inherits(CAR(args), "factor") && inherits(CADR(args), "factor"))
- return(cross_colon(call, CAR(args), CADR(args)));
-
+
s1 = CAR(args);
s2 = CADR(args);
+ if (inherits(s1, "factor") && inherits(s2, "factor"))
+ return(cross_colon(call, s1, s2));
+
n1 = length(s1);
n2 = length(s2);
- if (n1 == 0 || n2 == 0)
- errorcall(call, _("argument of length 0"));
- if (n1 > 1)
- warningcall(call,
- ngettext("numerical expression has %d element: only the first used",
- "numerical expression has %d elements: only the first used",
- (int) n1), (int) n1);
- if (n2 > 1)
- warningcall(call,
- ngettext("numerical expression has %d element: only the first used",
- "numerical expression has %d elements: only the first used",
- (int) n2), (int) n2);
+ if (n1 != 1 || n2 != 1) {
+ if (n1 == 0 || n2 == 0)
+ errorcall(call, _("argument of length 0"));
+ char *check = getenv("_R_CHECK_LENGTH_COLON_");
+ if ((check != NULL) ? StringTrue(check) : FALSE) // warn by default
+ errorcall(call, _("numerical expression has length > 1"));
+ else
+ warningcall(call, _("numerical expression has %d elements: only the first used"),
+ (n1 > 1) ? (int) n1 : (int) n2);
+ }
+
n1 = asReal(s1);
n2 = asReal(s2);
if (ISNAN(n1) || ISNAN(n2))
UPDATE 2023-03-22: R-devel (to become R 4.3.0) has been updated to optional check for this, cf. https://github.com/wch/r-source/commit/af23d59d23fd022dc620bbd26aa6114bc53e81df.
UPDATE: R 4.3.0 released on 2023-04-21 produces an error on this if _R_CHECK_LENGTH_COLON_=true
.
I think the next steps toward making this the default is:
[ ] R CMD check --as-cran
check the current package with something like _R_CHECK_LENGTH_COLON_=_R_CHECK_PACKAGE_NAME_,abort,verbose
.
[ ] All CRAN packages fix this type of bug.
[ ] R makes it the default behavior.
For the construct
a:b
, we get an error if eithera
orb
is empty, e.g.However, for
length(a) > 1
orlength(b) > 1
, we only get a warning;Should the latter also be an error? For example,
Related to Issues #38 and #48