Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.6k stars 982 forks source link

Symbol .I consistency when not grouping #2598

Open mattdowle opened 6 years ago

mattdowle commented 6 years ago

This has come up before I'm sure but I can't find the issue or S.O. post. Anyone remember or have the links please? I seem to remember replying to someone something like ".I is intended for use in grouping as per the documentation, but it would be good to extend it to non-grouping too". The man page still contains the words "while grouping" for .I.

Current behaviour in both v1.10.4-3 and dev :

> X = data.table(c("a","a","b","c","c"), 10:14)
> setkey(X,V1)
>  X["b"]
   V1 V2
1:  b 12       # ok
> X["b", .I]
[1] 1          # expected x's row number 3  (*1)
> X["b", .I, by=.EACHI]
   V1 I
1:  b 3        # ok
> X["b", .(.I,V2)]
   I V2
1: 1 12      # expected x's row number 3 not 1  (*2)
> X["b", .(.I,V2), by=.EACHI]
   V1 I V2
1:  b 3 12     # ok
> 

Now, which=TRUE was intended and works for the first case (*1) :

> X["b", which=TRUE]
[1] 3

but including x's row numbers inside j (2) isn't currently possible, unless you add x's row numbers explicitly as a column first. It would be nice for .I to do what which=TRUE does in the simple case (1) and maybe even slowly deprecate which=TRUE argument since my guess is people reach for .I first.

franknarf1 commented 6 years ago

Anyone remember or have the links please?

dracodoc commented 5 years ago

Can we use .I for global thus row number without grouping, and .i for local thus only inside grouping?

MichaelChirico commented 5 years ago

@dracodoc as mentioned by Frank this is #1206

dracodoc commented 5 years ago

Sorry I didn't realize there is a whole thread on this idea! That means this is a good idea right?

jangorecki commented 4 years ago

example from https://github.com/Rdatatable/data.table/issues/539

dt <- data.table(a=sample(letters, 100, T), b=rnorm(100))
dt[ a=="c", list(.N, .I)]
   N .I
1: 4  1
2: 4  2
3: 4  3
4: 4  4

dt[a=="c", list(.N, .I), by=a]
   a N .I
1: c 4 54
2: c 4 67
3: c 4 71
4: c 4 86