[BUGZILLA #17652] Extend head/tail to support vector ns when x has non-null dimensions (w/ patch incl. docs)

MichaelChirico commented 4 years ago

Created attachment 2485 [details] patch for head/tail including documentation udpates

Patch against trunk for the code and documentation of head/tail functions to generalize them to support restricting/subsetting in multiple directions. As discussed in the thread starting here: https://stat.ethz.ch/pipermail/r-devel/2019-July/078143.html

Some things of note:

head.default and tail.default check for a non-null dim(x) and so behave in the new dimension-aware way. This means that in the head case, head.data.frame, head.array, head.matrix, and head.table are all simply aliases for the default which are defined because they have been exported.
We should be forward-proof against LONG_DIMS because we are supporting vector length and dimmed object dimension subsetting with the same code, so I did what was suggested in the previous comment in head.matrix/tail.matrix.
I have defined head.matrix, head.array, tail.matrix, and tail.array. as separate aliases.

Please let me know if you would like anything done differently or if it goes too far or misses something important.

METADATA

Bug author - Gabriel Becker
Creation time - 2019-11-12 03:23:54 UTC
Bugzilla link
Status - CLOSED FIXED
Alias - None
Component - Wishlist
Version - R-devel (trunk)
Hardware - Other Other
Importance - P5 enhancement
Assignee - Martin Maechler
URL -
Modification time - 2020-01-30 15:12 UTC

MichaelChirico commented 4 years ago

Thank you, Gabe. The patch applied cleanly .. and checks all pass; but this has to wait some days before I get to have a more thorough look.

METADATA

Comment author - Martin Maechler
Timestamp - 2019-11-16 22:16:51 UTC

MichaelChirico commented 4 years ago

Finally, a version of this (your patch, simplified; also not kept most unneeded S3 methods you had for back compatibility) has just now been committed as svn rev 77462 to R-devel.

Thank you once more, Gabe Becker !

METADATA

Comment author - Martin Maechler
Timestamp - 2019-11-26 17:36:31 UTC

MichaelChirico commented 4 years ago

Ok, quite a bit of breakage on CRAN (well, probably less than 1 in 1000), but still.

It seems not so hard to fix the first instances I've looked at, but still:

1) head() must NOT drop extraneous attributes.
It does with the change.

This breaks, e.g., the  tests of  `lme4`.

Solution: Write a  head.data.frame()  method which *still* works with
          length(n) == 2 ... i,e., cannot just re-instantiate the previous.
      Still, I've done this now (not yet committed !),
      also introducing a base interface to R's C internal  copyMostAttrib(),
      (and providing a "data frame"-useful version for that).

2) head() worked previously, but now fails.

This crucially kills e.g., package glmmTMB {which does many more things, defining even more head methods, e.g., a truly trivial head.name() which we could still "move" to base.

......... for now, please do NOT provide new code; I'm fixing this in my code version

METADATA

Comment author - Martin Maechler
Timestamp - 2019-11-28 10:48:08 UTC

MichaelChirico commented 4 years ago

3) the 'CHNOSZ' CRAN package fails its checks because it uses tail(a, 1)
for a 1D-array and expects it to behave as if 'a' was a simple vector.

I claim the new behavior of head() / tail() is better : it does preserve the 'dim' attribute (and even in that case a named dim attribute, something pretty rarely noticed).

There, maintainer("CHNOSZ") should change the respective regression tests in tests/testthat/test-revisit.R :

1. Failure: 0-D, 1-D, 2-D and 3-D calculations give identical results at the same conditions (@test-revisit.R#40)
2. Failure: referenced objectives give expected results (@test-revisit.R#65)

(currently visible at 
 https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-gcc/CHNOSZ-00check.html

where however the reader should be aware that the maintainer also needs to replace the really really ugly  

  if (class(try(solve(comp), silent = TRUE)) == "try-error")

METADATA

Comment author - Martin Maechler
Timestamp - 2019-11-28 14:53:21 UTC

MichaelChirico commented 4 years ago

4) This is really really cool ... notably the solution I found [sorry ..] :

Exhibited by CRAN package TraMineR, which now fails its CRAN checks with R-devel. This is a bit "special" but nevertheless an interesting challenge.

Here's my own repr.ex. I wrote, once I found where the problem was:

[.noCol <- function (x, i, j, drop = FALSE) { if (!missing(j)) stop(" [!] Column subscripts not allowed", call. = FALSE) NextMethod("[") } noC <- structure(datasets::trees, class = c("noCol", "data.frame")) try( noC[1,2] ) # fails indeed -- as designed by the "noCol" subset-method head(noC) # fails indeed --- in R-devel, but NOT in R

but it reall SHOULD work, right? Why should
head(noC) not be the same {when nrow(.) >= 6} as
noC[1:6, ] as it used to be?

Well, to solve this nicely, I needed to use what we've learned many years ago from Peter Dalgaard: Do work with "The Missing" object (aka the empty symbol), which can "nicely" / concisely be written as

alist(. = )$.

and now you can use this in Gabe's lapply(..) construct instead of using 1: ... aah that's really beautiful, I think.

I haved commites this (late yesterday my time) as svn r 77474, so people can critique the (quite substantially) changed code
[and we can see what will happen with the ca 30 packages ...]

METADATA

Comment author - Martin Maechler
Timestamp - 2019-11-29 08:28:48 UTC

MichaelChirico commented 4 years ago

And a 3rd commit, svn r77484: In r77474, I forgot to export head.array; furthermore, now regression tests and example show the non back compatible change for 1D arrays: They were treated as vectors previously and e.g. tail(a1, 1) "fell down to vectors". This is no longer the case and considered a "consistency bug" fix: The very latest ?head 'Examples:' now contain

(This is what still breaks CRAN package CHNOSZ tests, so, only the tests would need to be adapted; CRAN package 'surveillance' seems more subtle, but also self-inflicted pain)

METADATA

Comment author - Martin Maechler
Timestamp - 2019-11-29 16:43:34 UTC

MichaelChirico commented 4 years ago

substitute() (without argument) gives "The Missing" object, too.

Instead of do.call(*), how about using eval() ? It could be as follows for 'head'. cl <- quote(x[, drop = FALSE]) cl <- cl[c(1L:2L, rep.int(3L, length(ds)), if(has.d) 4L)] ii <- which(!is.na(n[seq_along(ds)])) cl[2L + ii] <- lapply(ii, function(i) substitute( seq_len(if ((ni <- n[i]) < 0L) max(ds[i] + ni, 0L) else min(ni, ds[i])) )) eval(cl)

METADATA

Comment author - Suharto Anggono
Timestamp - 2019-11-30 15:31:20 UTC

MichaelChirico commented 4 years ago

(In reply to Suharto Anggono from comment #7)

substitute() (without argument) gives "The Missing" object, too.

Instead of do.call(*), how about using eval(<call>) ? It could be as follows
for 'head'.
cl <- quote(x[, drop = FALSE])
cl <- cl[c(1L:2L, rep.int(3L, length(ds)), if(has.d) 4L)]
ii <- which(!is.na(n[seq_along(ds)]))
cl[2L + ii] <- lapply(ii, function(i) substitute(
seq_len(if ((ni <- n[i]) < 0L) max(ds[i] + ni, 0L) else min(ni,
ds[i]))
))
eval(cl)

This seems a lot more arcane to get the same effect as what Martin has in here now, so I would vote against doing it this way, myself. The do.call way has the benefit of explicitly and (with the exception of the missing symbol construction) simply/straightforwardly constructing the index argument for each dimension in a way that can easily be seen and reasoned about.

@Martin What about a new function called missingArg or emptySym or something, defined as function() substitute() This seems to work:

f = function() substitute()
mtcars[1:4, f()]

            mpg cyl disp  hp drat    wt  qsec vs am gear carb

Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

I know that this has been on some people's wishlist for a while now, and would allow us to make your version of the code here clearer.

I'm happy to write a patch that introduces that and documentation for it and converts the logic here to using it, if that is something you'd consider.

METADATA

Comment author - Gabriel Becker
Timestamp - 2019-11-30 22:54:23 UTC

MichaelChirico commented 4 years ago

(In reply to Gabriel Becker from comment #8)

(In reply to Suharto Anggono from comment #7)
> substitute() (without argument) gives "The Missing" object, too.
> 
> Instead of do.call(*), how about using eval(<call>) ? It could be as
follows
> for 'head'.
>     cl <- quote(x[, drop = FALSE])
>     cl <- cl[c(1L:2L, rep.int(3L, length(ds)), if(has.d) 4L)]
>     ii <- which(!is.na(n[seq_along(ds)]))
>     cl[2L + ii] <- lapply(ii, function(i) substitute(
>         seq_len(if ((ni <- n[i]) < 0L) max(ds[i] + ni, 0L) else min(ni,
> ds[i]))
>     ))
>     eval(cl)

This seems a lot more arcane to get the same effect as what Martin has in
here now, so I would vote against doing it this way, myself. The do.call way
has the benefit of explicitly and (with the exception of the missing symbol
construction) simply/straightforwardly constructing the index argument for
each dimension in a way that can easily be seen and reasoned about.

Explicit index construction could also be used in my way above: cl[2L + ii] <- indvecs[ii]

Function 'expression' can also be used. e <- rep.int(expression([, x, , drop = FALSE), c(1L, 1L, length(ds), has.d)) ... eval(as.call(e))

With code using do.call([, c(list(x), )) , head() fails because the call is then evaluated. Using do.call([, c(list(quote(x)), )) would be OK. NB: I prefer the originally proposed do.call("[", *), because the symbol '[' is not expanded early.

METADATA

Comment author - Suharto Anggono
Timestamp - 2019-12-01 01:12:05 UTC

MichaelChirico commented 4 years ago

(In reply to Suharto Anggono from comment #9)

Function 'expression' can also be used.
e <- rep.int(expression(`[`, x, , drop = FALSE), c(1L, 1L, length(ds),
has.d))
...
eval(as.call(e))

Another possibility: args <- rep.int(as.list(expression(x, , drop = FALSE)), c(1L, length(ds), has.d)) ... do.call("[", args)

METADATA

Comment author - Suharto Anggono
Timestamp - 2019-12-01 02:55:47 UTC

MichaelChirico commented 4 years ago

(In reply to Suharto Anggono from comment #9)

With code using
do.call(`[`, c(list(x), *)) ,
head(<call>) fails because the call is then evaluated. Using
do.call(`[`, c(list(quote(x)), *))
would be OK.

Also OK: do.call([, c(list(x), *), quote = TRUE)

METADATA

Comment author - Suharto Anggono
Timestamp - 2019-12-01 04:51:54 UTC

MichaelChirico commented 4 years ago

(In reply to Suharto Anggono from comment #10)

(In reply to Suharto Anggono from comment #9)
Another possibility:
args <- rep.int(as.list(expression(x, , drop = FALSE)), c(1L,
length(ds), has.d))
...
do.call("[", args)

Ah, as.list(expression()) can be written as alist()

list(quote(x)) is the same as alist(x)

METADATA

Comment author - Suharto Anggono
Timestamp - 2019-12-01 09:20:44 UTC

MichaelChirico commented 4 years ago

So one thing that comes to mind for me here is that I've always felt that good programming is as simple as possible (though no simpler, of course).

Might it not be more straightforward to use dispatch to differentiate between x's that need to not be evalauted by do.call from the rest (which can almost completely be covered by the old default method). That is, after all, largely the type of flow control that dispatch is for in the first place.

My patch only collapsed (almost) everything into the default method because the method I wrote happened to seem to work for everything (subtle 'bugs'/behavior differences Martin has tracked down not withstanding).

METADATA

Comment author - Gabriel Becker
Timestamp - 2019-12-02 08:55:35 UTC

MichaelChirico commented 4 years ago

Thank you, Suharto, for your proposals and Gabe, too.

Some remarks -- where I'm stating my taste and current thoughts. Some of these may be not quite correct; if you find such, please tell us about it.

1) Clearly, substitute() is nicer (and more efficient) than alist(x=.)$x ; consequently, the current code is to improved.

2) The difference of do.call("[", ..) vs do.call([, ..) is "interesting", but

is not documented AFAIK, and hence may be a coincidence, that R core may want to change in the future
is no longer relevant I think as now both head.* and tail.default() are used for objects of class "call", and no longer use do.call(.) in those cases at all: a call cannot have a dim attribute

3) For now, I tend do agree with Suharto's last proposals of using smart rep.int(, ..) instead of the lapply() construction using "the Missing" explicitly to get the list of indices for the do.call(.)

4) I cannot find any case where the current code (R-devel svn rev 77484) is not doing "the right thing"; it does, AFAICS, e.g., work correctly for a "matrix of expressions" i.e., an expression with dim().

METADATA

Comment author - Martin Maechler
Timestamp - 2019-12-02 11:15:46 UTC

MichaelChirico commented 4 years ago

I have now (svn rev 77520) committed changes to head.R which make

the code considerably smaller,
head() and tail() methods more similar to each other, and
also no longer needs .tailindices()

This is mostly based on Suharto's proposals and analysis, which ended up in using

rep(alist(..), ...)

to create the arguments for the do.call(.).

No functionality should be changed.. both default methods still divert to the .array() method in case length(n) > 1 and non-empty dim(x). This should typically never be called, as that case should've triggered the .array (or .matrix which is the same) method instead of calling the .default(). but then people do define their own classes and still rely on the *.default() method, so it does make sense probably.

This could be closed now from my point of view.

METADATA

Comment author - Martin Maechler
Timestamp - 2019-12-05 10:53:08 UTC

MichaelChirico commented 4 years ago

By using old code for length(n) == 1L in default method, new functionality doesn't come through:

head(letters, NAinteger) doesn't work.
Suppose that 'head' method is not defined for class "a". With x <- structure(matrix(, 2, 2), class="a") , head(x, 1) is x[1], not x[1, , drop = FALSE] .

Issue on 'addrownums' for 'tail': With the possibility to take subset on dimension other than the first, why only row numbers that is added? For example, in printing of tail(matrix(1:4, 2, 2), c(1, 1)) , row label shows "2", but column label shows "1".

METADATA

Comment author - Suharto Anggono
Timestamp - 2019-12-05 16:15:16 UTC

MichaelChirico commented 4 years ago

(In reply to Suharto Anggono from comment #16)

By using old code for length(n) == 1L in default method, new functionality
doesn't come through:

Agreed, it needs to be checking for the dim attribute and not (or at least not just) length of n. Unless this is intentional for non-backwards compatability @martin?

IMHO, I think having

head(mat, 5) and head(mat, c(5, NA)) do radically different things is something we should avoid if at all possible.

- head(letters, NA_integer_)  doesn't work.

This is true, but honestly I think this should be an error anyway. We should require at least one element of n be specified (present and non-missing). The documentation would need to be updated but a simple sentence akin to the one preceding this one would suffice.

- Suppose that 'head' method is not defined for class "a". With
x <- structure(matrix(, 2, 2), class="a") ,
head(x, 1) is x[1], not x[1, , drop = FALSE] .

Issue on 'addrownums' for 'tail': With the possibility to take subset on
dimension other than the first, why only row numbers that is added? For
example, in printing of
tail(matrix(1:4, 2, 2), c(1, 1)) ,
row label shows "2", but column label shows "1".

This is an interesting idea which I didn't address at all in the original patch.

METADATA

Comment author - Gabriel Becker
Timestamp - 2019-12-06 00:04:11 UTC

MichaelChirico commented 4 years ago

(In reply to Gabriel Becker from comment #17)

(In reply to Suharto Anggono from comment #16)
> By using old code for length(n) == 1L in default method, new functionality
> doesn't come through:

Agreed, it needs to be checking for the dim attribute and not (or at least
not just) length of n. Unless this is intentional for non-backwards
compatability @martin?

Yes, I wrote it the way it is, to keep back compatibility.. note that Gabe's original patch badly failed, e.g. for calls.

Also note, that in fact there we cannot just use drop = FALSE because that does break some other [ method {e.g., formula}.

But I agree that this part should be tweaked in the two default methods.


IMHO, I think having

head(mat, 5) and head(mat, c(5, NA)) do radically different things is
something we should avoid if at all possible.

> - head(letters, NA_integer_)  doesn't work.

This is true, but honestly I think this should be an error anyway. We should
require at least one element of n be specified (present and non-missing).
The documentation would need to be updated but a simple sentence akin to the
one preceding this one would suffice.

> - Suppose that 'head' method is not defined for class "a". With
> x <- structure(matrix(, 2, 2), class="a") ,
> head(x, 1) is x[1], not x[1, , drop = FALSE] .

yes, I agree, that this is incorrect .. and is being addressed via my ongoing revision of the default method, see above.

> Issue on 'addrownums' for 'tail': With the possibility to take subset on
> dimension other than the first, why only row numbers that is added? For
> example, in printing of
> tail(matrix(1:4, 2, 2), c(1, 1)) ,
> row label shows "2", but column label shows "1".

This is an interesting idea which I didn't address at all in the original
patch.

"interesting" is not strong enough. Suharto is right, and this argument should rather be call 'addDimNums' or rather just 'addnums' AND its implementation really needs to be updated for all dimensions, not just the "rows". If you (Gabe) see how to generalize that part, I'd be happy to see your code proposal.

METADATA

Comment author - Martin Maechler
Timestamp - 2019-12-06 14:20:33 UTC

MichaelChirico commented 4 years ago

(In reply to Martin Maechler from comment #18)

(In reply to Gabriel Becker from comment #17)
> (In reply to Suharto Anggono from comment #16)
> > By using old code for length(n) == 1L in default method, new
functionality
> > doesn't come through:
> 
> Agreed, it needs to be checking for the dim attribute and not (or at least
> not just) length of n. Unless this is intentional for non-backwards
> compatability @martin?

Yes, I wrote it the way it is, to keep back compatibility.. note that Gabe's
original patch badly failed, e.g. for calls.

Also note, that in fact there we cannot just use  `drop = FALSE`
because that does break *some* other `[` method {e.g., formula}.

But I agree that this part should be tweaked in the two default methods.

So the core issue, I think, is that we need to decide whether the new behavior should happen for alls "arrays" (with scare-quotes), ie anything with a non-NULL dimension attribute, or only for array (which includes matrices now thanks to your changes, Martin) and data.frame objects.

If it is the former (which was the intent of my patch), then we can differentiate by null vs non-null dimensions in in the default method.

calls and formula objects shouldn't have a dimension attribute, should they?

Are there any cases you saw in the CRAN breakages where there is a non-null attribute AND the x[1:n] form seemed required? In particular calls and formulas should never have a non-null dimension, right? So if the check were if(is.null(dim(x))

then we'd avoid the breakages and retain the dimension-aware behavior in all cases where the dimension is present.


"interesting" is not strong enough.  Suharto is right, and this argument
should rather be call 'addDimNums' or rather just 'addnums'  *AND* its
implementation really needs to be updated for all dimensions, not just the
"rows".
If you (Gabe) see how to generalize that part, I'd be happy to see your code
proposal.

I will do this and get a patch for it back to you

METADATA

Comment author - Gabriel Becker
Timestamp - 2019-12-07 05:04:18 UTC

MichaelChirico commented 4 years ago

(In reply to Gabriel Becker from comment #19)

(In reply to Martin Maechler from comment #18)

[.............]

> But I agree that this part should be tweaked in the two default methods.

So the core issue, I think, is that we need to decide whether the new
behavior should happen for alls "arrays" (with scare-quotes), ie anything
with a non-NULL dimension attribute, or only for array (which includes
matrices now thanks to your changes, Martin) and data.frame objects. 

If it is the former (which was the intent of my patch), then we can
differentiate by null vs non-null dimensions in in the default method.

calls and formula objects shouldn't have a dimension attribute, should they?

formulas are special cases of calls; and no, they even cannot have a dim() contrary to expressions, btw.

Are there any cases you saw in the CRAN breakages where there is a non-null
attribute AND the x[1:n] form seemed required? In particular calls and
formulas should never have a non-null dimension, right? So if the check were
if(is.null(dim(x)) <do the old way>

then we'd avoid the breakages and retain the dimension-aware behavior in all
cases where the dimension is present.

This is exactly what I committed yesterday after saying here I would review the *.default() methods. Please look at R-devel svn rev >= 77535


Suharto is right, and this argument
> should rather be call 'addDimNums' or rather just 'addnums'  *AND* its
> implementation really needs to be updated for all dimensions, not just the
> "rows".
> If you (Gabe) see how to generalize that part, I'd be happy to see your
code
> proposal.

I will do this and get a patch for it back to you

very good! .. thank you, Gabe, in advance!

METADATA

Comment author - Martin Maechler
Timestamp - 2019-12-07 10:21:08 UTC

MichaelChirico commented 4 years ago

Created attachment 2509 [details] new patch extends addrownums (now keepnums) functionality, incl. tests and doc changes

Sorry for the delay on this. There were more details to get right than I had naively expected. Nonetheless, I've extended the 'addrownums' logic as discussed. I changed argument name to keepnums, though addrownums is kept as deprecated. It assigns dimnames exactly as they would have been printed within the full output in all dimensions. (Note that this means that in dimensions higher htan 2, this is just a character string of the original index that was included).

I made it an error to give an n with a length that is too large (>1 if dims are NULL, >length(dim(x)) otherwise) and to give an n that is all missing.

I added regression tests to ensure that the keepnums=TRUE logic works for both rows, columns, and higher dimensions (the test covers up to 4).

I also added a note to the documentation about care been needed regarding the difference between tail(arr, c(2, 3, 4))[,,3] and tail(arr, c(2, 3, 4))[,,"3"], since those can now both be valid but mean completely different things.

Let me know if you have any questions or you'd like something to work differently. I'm happy to continue to help.

METADATA

Comment author - Gabriel Becker
Timestamp - 2020-01-01 23:36:32 UTC

INCLUDED PATCH

ID - 9
Author - Gabriel Becker
Link to download patch - https://bugs.r-project.org/bugzilla/attachment.cgi?id=2509
Timestamp - 2020-01-01 23:36 UTC
Extra info - (13.44 KB, patch)

MichaelChirico commented 4 years ago

(In reply to Gabriel Becker from comment #21)

Thank you Gabe. I've not yet got the time look into the new patch in details, but just in case you have time now: There's no reason to deprecate an option that was never released, so we can safely replace (instead of deprecate) the 'addrownums'.

METADATA

Comment author - Martin Maechler
Timestamp - 2020-01-08 16:34:08 UTC

MichaelChirico commented 4 years ago

(In reply to Martin Maechler from comment #22)

(In reply to Gabriel Becker from comment #21)

Thank you Gabe. I've not yet got the time look into the new patch in
details, but just in case you have time now:  There's no reason to deprecate
an option that was never released, so we can safely replace (instead of
deprecate) the 'addrownums'.

I don't understand what you mean here. my installed copy of 3.5.1 (release) has addrownums in the formals of tail.matrix (though not the tail generic/tail.default). That was why I went the deprecation route.

That said I'm happy to change said deprecation to a replacement if you would still like me to do so. Let me know.

METADATA

Comment author - Gabriel Becker
Timestamp - 2020-01-08 20:04:40 UTC

MichaelChirico commented 4 years ago

(In reply to Gabriel Becker from comment #23)

> 
I don't understand what you mean here. my installed copy of 3.5.1 (release)
has addrownums in the formals of tail.matrix (though not the tail
generic/tail.default). That was why I went the deprecation route.

I'm sorry, entirely my fault; the deprecation route is necessary. Martin

METADATA

Comment author - Martin Maechler
Timestamp - 2020-01-08 21:13:25 UTC

MichaelChirico commented 4 years ago

In 'tail.array', for non-null dimnames(ans) , 'jj' is an index to 'ii', making it also an index to 'sel'. The corresponding 'jj' for dimnames(ans) that is NULL should be seq_along(ii) . The assignment should be dimnames(ans)[ii[jj]] <- lapply(jj, function(k) { dnum <- ii[k] ... })

Alternatively, keeping 'jj' as an index to dim(x) , like 'ii', 'jj' for non-null dimnames(ans) should be ii[sapply(adnms, is.null)] . The assignment could be dimnames(ans)[jj] <- lapply(jj, function(dnum) ... ) . Inside the function in 'lapply', args[[1L + dnum]] could be used in place of sel[[k]] (so, 'sel' is not used).

Using 'switch' is possible. if(dnum == 1L) expr1 else if(dnum == 2L) expr2 else expr3 could be switch(min(dnum, 3L), expr1, expr2, expr3)

Information about 'addrownums' in "Details" section of the documentation has not been updated.

Not documented:

dimnames is not added for 1-dimensional array
dimnames is added only for dimension where 'n' is specified (not NA)

If conversion of 'addrownums' to 'keepnums' is done in 'tail.matrix' only, 'tail.ftable' should not use 'keepnums' by itself, because 'addrownums' is not yet accounted for. Instead, 'tail.ftable' should just pass to 'tail.matrix', like the following. r <- tail.matrix(format(x), *) if(is.null(rownames(r))) rownames(r) <- rep.int("", nrow(r)) if(is.null(colnames(r))) colnames(r) <- rep.int("", ncol(r)) noquote(r)

I think, 'tail.array' (which is new) and 'tail.default' (which previously doesn't have 'addrownums') doesn't need to have 'addrownums'. Yes, if that is the case, 'tail.matrix' and 'tail.array' are not the same. Deprecation of 'addrownums', if desired, could be in 'tail.matrix' only.

But, how about not disturbing old use? Instead of deprecating 'addrownums', how about allowing 'addrownums' only in case when only 'n' corresponding to row is specified (not NA)?

It could also be no deprecation and no restriction. tail.matrix: keepnums = addrownums, addrownums = TRUE tail.ftable: keepnums = addrownums, addrownums = FALSE

Check of 'n' in 'head.ftable' and 'head.function' are too much. On the other hand, 'tail.array' doesn't check 'n'. Placing check in '.array' and '.default' is sufficient, because all eventually go to them. Moreover, in '.default', only check of is.na(n) is actually needed in else if(length(n) == 1L) . In '.array', non-null dim(x) is assumed.

Because other places use 'rep.int', I propose to use 'rep.int' instead of 'rep' for 'args': args <- rep.int(alist(x, ), c(1L, length(d))) args$drop <- FALSE

A code reader, like me initially, might wonder 'rep' versus 'rep.int'.

METADATA

Comment author - Suharto Anggono
Timestamp - 2020-01-11 18:33:40 UTC

MichaelChirico commented 4 years ago

(In reply to Suharto Anggono from comment #25)

In 'tail.array', for non-null dimnames(ans) , 'jj' is an index to 'ii',
making it also an index to 'sel'.
The corresponding 'jj' for dimnames(ans) that is NULL should be
seq_along(ii) .

This is the case as it exists now because jj is set to ii when dimnames(x) is NULL. so seq_along(jj) is equivalent to seq_along(ii) there.

You are however correct that there is a problem (not caught by any existing or new tests) when the dimnames for some but not all of sliced dimensions are non-NULL.

I will fix this and submit a new patch in the next couple days, probably monday. Obviously that updated patch will include a regression test that would have caught this case.

The assignment should be
dimnames(ans)[ii[jj]] <- lapply(jj, function(k) { dnum <- ii[k]  ... })

Alternatively, keeping 'jj' as an index to dim(x) , like 'ii', 'jj' for
non-null dimnames(ans) should be ii[sapply(adnms, is.null)] .
The assignment could be
dimnames(ans)[jj] <- lapply(jj, function(dnum) ... ) .
Inside the function in 'lapply', args[[1L + dnum]] could be used in place of
sel[[k]] (so, 'sel' is not used).

sel[[k]] is self-describing, args[[1L + dnum]] is not. In my mind that makes using sel here not just equivalently appropriate but actively better than what you suggest.


Using 'switch' is possible.
if(dnum == 1L) expr1 else if(dnum == 2L) expr2 else expr3
could be
switch(min(dnum, 3L), expr1, expr2, expr3)

No, you are incorrect here. The third argument in the above would be take as the case for the value 3, NOT as a default for all values other than 1 and 2. If you look closely at comments I added, I believe in the new tests I wrote, you'll see that initially tried this and caught it during testing.


Information about 'addrownums' in "Details" section of the documentation has
not been updated.

Not documented:
- dimnames is not added for 1-dimensional array
- dimnames is added only for dimension where 'n' is specified (not NA)

I will do another pass on the documentation for my updated patch.


If conversion of 'addrownums' to 'keepnums' is done in 'tail.matrix' only,
'tail.ftable' should not use 'keepnums' by itself, because 'addrownums' is
not yet accounted for. Instead, 'tail.ftable' should just pass to
'tail.matrix', like the following.
r <- tail.matrix(format(x), *)
if(is.null(rownames(r))) rownames(r) <- rep.int("", nrow(r))
if(is.null(colnames(r))) colnames(r) <- rep.int("", ncol(r))
noquote(r)

I think, 'tail.array' (which is new) and 'tail.default' (which previously
doesn't have 'addrownums') doesn't need to have 'addrownums'. Yes, if that
is the case, 'tail.matrix' and 'tail.array' are not the same. Deprecation of
'addrownums', if desired, could be in 'tail.matrix' only.

One of the major points of this is to make tail.array and tail.matrix the same, so I do not agree here at all. "Adding" (but not really) addrows to tail.array and perserving the fact that tail.array and tail.matrix are identical is strongly preferable to splitting them back into separate functions.


But, how about not disturbing old use?
Instead of deprecating 'addrownums', how about allowing 'addrownums' only in
case when only 'n' corresponding to row is specified (not NA)?

No I disagree here as well, there shouldn't two arguments that dictate such similar and overlapping behavior. After the initial pain period we'd be left with two extremely similar arguments which would needly confuse the users.


It could also be no deprecation and no restriction.
tail.matrix: keepnums = addrownums, addrownums = TRUE
tail.ftable: keepnums = addrownums, addrownums = FALSE

Check of 'n' in 'head.ftable' and 'head.function' are too much. On the other
hand, 'tail.array' doesn't check 'n'.

That is simple oversight that I will fix, thank you for catching it.

Placing check in '*.array' and
'*.default' is sufficient, because all eventually go to them. Moreover, in
'*.default', only check of is.na(n) is actually needed in
else if(length(n) == 1L) .
In '*.array', non-null dim(x) is assumed.

I felt it was valuable to put all the checks in the same place and the extraneous checks are trivially passed in the cases where they do not apply, so while technically correct I don't think this is worth writing separate code in each place.

The checks are essentially free (length(dim(x)) is never going to be even moderately big) and checkHTN documents all the requirements for n in one place. I view that as worth a tiny bit of trivial redundancy


Because other places use 'rep.int', I propose to use 'rep.int' instead of
'rep' for 'args':
args <- rep.int(alist(x, ), c(1L, length(d)))
args$drop <- FALSE

A code reader, like me initially, might wonder 'rep' versus 'rep.int'.

I don't really see the value of that change, in my opinion that is a perefectly reasonable use of rep(). That said, I didn't write that code, Martin did, so I can change it (or he can) if he desires.

METADATA

Comment author - Gabriel Becker
Timestamp - 2020-01-12 08:07:48 UTC

MichaelChirico commented 4 years ago

(In reply to Gabriel Becker from comment #26)

> 
> Using 'switch' is possible.
> if(dnum == 1L) expr1 else if(dnum == 2L) expr2 else expr3
> could be
> switch(min(dnum, 3L), expr1, expr2, expr3)

No, you are incorrect here. The third argument in the above would be take as
the case for the value 3, NOT as a default for all values other than 1 and
2. If you look closely at comments I added, I believe in the new tests I
wrote, you'll see that initially tried this and caught it during testing.

Apologies, I read that too quickly; this would, of course, work. That said, I don't really see any benefit: If, else if, else is a very straightforward, well understood control flow. The cleverness/mild redirection of using min there to force a default doesn't actually give any advantage that I can see.

I'm strongly against any code ever being cleverer than it needs to be. I will change the comments referring to switch slightly to reflect this though.

METADATA

Comment author - Gabriel Becker
Timestamp - 2020-01-12 08:49:22 UTC

MichaelChirico commented 4 years ago

Created attachment 2522 [details] updated keepnums patch

Sorry for the delay, this patch addresses issues raised by Suharto.

keepnums now works correctly with a mix of specified and non-specified dimnames on x, and there is a regression test to ensure this.

The documentation is updated to reflect keepnums in Details section. and massaged in a couple of other places.

I did not change the arguments accepted by the tail methods for the reasons i laid out in my previous comment, though I can do it if you (Martin) feel strongly about it. I also didn't switch the if/else if/else to a switch(max(.),)

Added missing n check for tail.array. Removed n check from head/tail.ftable. Kept check in head/tail.function for the reason specified in the comment in head.function. Namely the n check after it is changed into a parsed matrix is not correct (n of length 2 would be allowable, technically)

METADATA

Comment author - Gabriel Becker
Timestamp - 2020-01-16 01:55:42 UTC

INCLUDED PATCH

ID - 2
Author - Gabriel Becker
Link to download patch - https://bugs.r-project.org/bugzilla/attachment.cgi?id=2522
Timestamp - 2020-01-16 01:55 UTC
Extra info - (11.54 KB, patch)

MichaelChirico commented 4 years ago

Thank you, Gabe, for your work, but

can you please revise and modify your patch?

no trailing spaces in R scripts (at least one diff is only because you added trailing spaces)
no "=" for assignment
no changes to test-reg-1c.R read its first lines and compare with those of *-1d.R}

[I thought you would at least look at what I committed from your last patch: Already moved things from -1c.R to -1d.R , removed trailing spaces, changed "=" to "<-" .. ]

METADATA

Comment author - Martin Maechler
Timestamp - 2020-01-17 08:31:28 UTC

MichaelChirico commented 4 years ago

(In reply to Martin Maechler from comment #29)

Thank you, Gabe, for your work, but

can you please revise and modify your patch?

- no trailing spaces in R scripts (at least one diff is only because you
*added* trailing spaces)

- no "=" for assignment

- no changes to test-reg-1c.R read its first lines and compare with those of
*-1d.R}

[I thought you would at least look at what I committed from your last
patch: Already   moved things from *-1c.R to *-1d.R , removed trailing
spaces, changed "=" to "<-" .. ]

Ah, these are embarrassing, sorry about that. I am not sure how I missed all of that but it is my fault.

Of course I'll take care of these, with apologies for the unnecessary overhead for you, and I'll work hard to avoid similar issues in the future.

METADATA

Comment author - Gabriel Becker
Timestamp - 2020-01-18 01:53:56 UTC

MichaelChirico commented 4 years ago

Created attachment 2524 [details] Cleaned up version updated keepnums patch

With apologies.

METADATA

Comment author - Gabriel Becker
Timestamp - 2020-01-21 21:31:03 UTC

INCLUDED PATCH

ID - 4
Author - Gabriel Becker
Link to download patch - https://bugs.r-project.org/bugzilla/attachment.cgi?id=2524
Timestamp - 2020-01-21 21:31 UTC
Extra info - (9.01 KB, patch)

MichaelChirico commented 4 years ago

(In reply to Gabriel Becker from comment #31)

I've now committed this (pretty edited slightly.. can't resist, ;-) as svn rev 77748.

Thanks a lot, Gabe!

METADATA

Comment author - Martin Maechler
Timestamp - 2020-01-30 15:12:04 UTC

MichaelChirico / r-bugs