HenrikBengtsson / Wishlist-for-R

Features and tweaks to R that I and others would love to see - feel free to add yours!
https://github.com/HenrikBengtsson/Wishlist-for-R/issues
GNU Lesser General Public License v3.0
134 stars 4 forks source link

ROBUSTNESS: a:b when length(a) > 1 or length(b) > 1 #61

Open HenrikBengtsson opened 6 years ago

HenrikBengtsson commented 6 years ago

For the construct a:b, we get an error if either a or b is empty, e.g.

> x <- integer(0)
> x:3
Error in x:3 : argument of length 0
> 3:x
Error in 3:x : argument of length 0

However, for length(a) > 1 or length(b) > 1, we only get a warning;

> x <- 1:2
> x:3
[1] 1 2 3
Warning in x:3 :
In x:3 : numerical expression has 2 elements: only the first used
> 3:x
[1] 3 2 1
Warning message:
In 3:x : numerical expression has 2 elements: only the first used

Should the latter also be an error? For example,

> x <- 1:3
> x:3
Error in x:3 : argument is not of length one
> 3:x
Error in 3:x : argument is not of length one

Related to Issues #38 and #48

HenrikBengtsson commented 6 years ago

I've posted this to R-devel thread 'ROBUSTNESS: a:b to give an error when length(a) > 1 or length(b) > 1 - not just a warning' on 2018-09-01.

HenrikBengtsson commented 2 years ago

Analogously to _R_CHECK_LENGTH_1_CONDITION_ and _R_CHECK_LENGTH_1_LOGIC2_, this could temporarily be controlled by a new _R_CHECK_LENGTH_1_COLON_ environment variable, e.g. _R_CHECK_LENGTH_1_COLON_=true.

HenrikBengtsson commented 2 years ago

It turns out there's one valid case for length(a) > 1 and length(b) > 1, namely when a and b are factors and length(a) == length(b), e.g.

> x <- as.factor(letters[1:3])
> y <- as.factor(LETTERS[4:6])
> x:y
[1] a:D b:E c:F
Levels: a:D a:E a:F b:D b:E b:F c:D c:E c:F

This is covered in https://github.com/wch/r-source/blob/9a66e7cc091d68a8fefb99e3c3640ab8a6fb599e/src/main/seq.c#L21-L23.

HenrikBengtsson commented 2 years ago

The code that needs to be updated is:

SEXP attribute_hidden do_colon(SEXP call, SEXP op, SEXP args, SEXP rho)
{
    SEXP s1, s2;
    double n1, n2;

    checkArity(op, args);
    if (inherits(CAR(args), "factor") && inherits(CADR(args), "factor"))
    return(cross_colon(call, CAR(args), CADR(args)));

    s1 = CAR(args);
    s2 = CADR(args);
    n1 = length(s1);
    n2 = length(s2);
    if (n1 == 0 || n2 == 0)
    errorcall(call, _("argument of length 0"));
    if (n1 > 1)
    warningcall(call,
            ngettext("numerical expression has %d element: only the first used",
                 "numerical expression has %d elements: only the first used",
                 (int) n1), (int) n1);
    if (n2 > 1)
    warningcall(call,
            ngettext("numerical expression has %d element: only the first used",
                 "numerical expression has %d elements: only the first used",
                 (int) n2), (int) n2);
    n1 = asReal(s1);
    n2 = asReal(s2);
    if (ISNAN(n1) || ISNAN(n2))
    errorcall(call, _("NA/NaN argument"));
    return seq_colon(n1, n2, call);
}

BTW, what happens right now if the length is > 2^32? Should n1 = length(s1); be n1 = xlength(s1);?

HenrikBengtsson commented 2 years ago

BTW, what happens right now if the length is > 2^32? Should n1 = length(s1); be n1 = xlength(s1);?

It turns out it's already taken care of indirectly, e.g.

> a <- 1:.Machine[["integer.max"]]
> 1:a
[1] 1
Warning message:
In 1:a : numerical expression has 2147483647 elements: only the first used

> a <- 1:(.Machine[["integer.max"]]+1)
> 1:a
Error: long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537
HenrikBengtsson commented 2 years ago

Below is an R-devel patch that gives:

> a <- 1:2
> 1:a
[1] 1
Warning message:
In 1:a : numerical expression has 1 elements: only the first used

> Sys.setenv("_R_CHECK_LENGTH_COLON_" = "true")
> a <- 1:2
> 1:a
Error in 1:a : numerical expression has length > 1
> a:1
Error in a:1 : numerical expression has length > 1

> Sys.setenv("_R_CHECK_LENGTH_COLON_" = "false")
> a <- 1:2
> 1:a
[1] 1
Warning message:
In 1:a : numerical expression has 1 elements: only the first used

Patch

$ svn diff
Index: src/main/seq.c
===================================================================
--- src/main/seq.c  (revision 83118)
+++ src/main/seq.c  (working copy)
@@ -144,25 +144,25 @@
     double n1, n2;

     checkArity(op, args);
-    if (inherits(CAR(args), "factor") && inherits(CADR(args), "factor"))
-   return(cross_colon(call, CAR(args), CADR(args)));
-
+    
     s1 = CAR(args);
     s2 = CADR(args);
+    if (inherits(s1, "factor") && inherits(s2, "factor"))
+   return(cross_colon(call, s1, s2));
+
     n1 = length(s1);
     n2 = length(s2);
-    if (n1 == 0 || n2 == 0)
-   errorcall(call, _("argument of length 0"));
-    if (n1 > 1)
-   warningcall(call,
-           ngettext("numerical expression has %d element: only the first used",
-                "numerical expression has %d elements: only the first used",
-                (int) n1), (int) n1);
-    if (n2 > 1)
-   warningcall(call,
-           ngettext("numerical expression has %d element: only the first used",
-                "numerical expression has %d elements: only the first used",
-                (int) n2), (int) n2);
+    if (n1 != 1 || n2 != 1) {
+   if (n1 == 0 || n2 == 0)
+       errorcall(call, _("argument of length 0"));
+   char *check = getenv("_R_CHECK_LENGTH_COLON_");
+   if ((check != NULL) ? StringTrue(check) : FALSE) // warn by default
+       errorcall(call, _("numerical expression has length > 1"));
+   else
+       warningcall(call, _("numerical expression has %d elements: only the first used"),
+           (int) (n1 > 1) ? n1 : n2);
+    }
+
     n1 = asReal(s1);
     n2 = asReal(s2);
     if (ISNAN(n1) || ISNAN(n2))
HenrikBengtsson commented 2 years ago

Posted PR#18419 'Robustification of a:b - produce an error if length(a) != 1 or length(b) != 1' with the above patch.

HenrikBengtsson commented 1 year ago

A minor fix to the patch (https://bugs.r-project.org/show_bug.cgi?id=18419) has been updated accordingly:

Index: src/main/seq.c
===================================================================
--- src/main/seq.c  (revision 84006)
+++ src/main/seq.c  (working copy)
@@ -144,25 +144,25 @@
     double n1, n2;

     checkArity(op, args);
-    if (inherits(CAR(args), "factor") && inherits(CADR(args), "factor"))
-   return(cross_colon(call, CAR(args), CADR(args)));
-
+    
     s1 = CAR(args);
     s2 = CADR(args);
+    if (inherits(s1, "factor") && inherits(s2, "factor"))
+   return(cross_colon(call, s1, s2));
+
     n1 = length(s1);
     n2 = length(s2);
-    if (n1 == 0 || n2 == 0)
-   errorcall(call, _("argument of length 0"));
-    if (n1 > 1)
-   warningcall(call,
-           ngettext("numerical expression has %d element: only the first used",
-                "numerical expression has %d elements: only the first used",
-                (int) n1), (int) n1);
-    if (n2 > 1)
-   warningcall(call,
-           ngettext("numerical expression has %d element: only the first used",
-                "numerical expression has %d elements: only the first used",
-                (int) n2), (int) n2);
+    if (n1 != 1 || n2 != 1) {
+   if (n1 == 0 || n2 == 0)
+       errorcall(call, _("argument of length 0"));
+   char *check = getenv("_R_CHECK_LENGTH_COLON_");
+   if ((check != NULL) ? StringTrue(check) : FALSE) // warn by default
+       errorcall(call, _("numerical expression has length > 1"));
+   else
+       warningcall(call, _("numerical expression has %d elements: only the first used"),
+           (n1 > 1) ? (int) n1 : (int) n2);
+    }
+
     n1 = asReal(s1);
     n2 = asReal(s2);
     if (ISNAN(n1) || ISNAN(n2))
HenrikBengtsson commented 1 year ago

UPDATE 2023-03-22: R-devel (to become R 4.3.0) has been updated to optional check for this, cf. https://github.com/wch/r-source/commit/af23d59d23fd022dc620bbd26aa6114bc53e81df.

HenrikBengtsson commented 1 year ago

UPDATE: R 4.3.0 released on 2023-04-21 produces an error on this if _R_CHECK_LENGTH_COLON_=true.

I think the next steps toward making this the default is: