I find which argument quite confusing/counterintuitive when joining and returning i row numbers in x[i, which=NA, ...].
A join and which argument can interact in four different ways as shown below:
x = data.table(a=1:3, x=c(NA, 10, NA))
i = data.table(a=2:5, y=c(20, 10, 20, 30))
x
a x
<int> <num>
1: 1 NA
2: 2 10
3: 3 NA
i
a y
<int> <num>
1: 2 20
2: 3 10
3: 4 20
4: 5 30
x[i, on="a", which=TRUE] # (a): ok
[1] 2 3 NA NA
x[!i, on="a", which=TRUE] # (b): ok
[1] 1
x[i, on="a", which=NA] # (c): counterintuitive
[1] 3 4
x[!i, on="a", which=NA] # (d): counterintuitive
[1] 1 2
(a): row numbers of x that i matches to.
(b): row numbers of x that no i matches to.
(c): row numbers of i that have no match to x. The fact that i is not prefixed with ! makes it counterintuive.
(d): row numbers of i that have a match to x. The use of ! suggests that the cases that have no match are of interest while it is actually the opposite.
I propose to allow a character string in which with four possible values (other propositions are very welcome): c("xmatch", "xnomatch", "imatch", "inomatch") where they correspond to (a), (b), (d), and (c) scenarios, respectively. These values would work as follow:
x[i, on="a", which="xmatch"] # row number of x that i matches to
x[i, on="a", which="xnomatch"] # row numbers of x that no i matches to
x[i, on="a", which="imatch"] # row numbers of i that have a match to x
x[i, on="a", which="inomatch"] # row numbers of i that have no match to x
So, the character string specified would allow to know the type of join (whether i needs to be prefixed with ! or not) and the data.table whose row numbers should be returned.
With this feature, data.table:::[.data.table would behave as below:
This is a feature request.
I find
which
argument quite confusing/counterintuitive when joining and returningi
row numbers in x[i, which=NA, ...].A join and
which
argument can interact in four different ways as shown below:(a): row numbers of x that i matches to. (b): row numbers of x that no i matches to. (c): row numbers of i that have no match to x. The fact that i is not prefixed with
!
makes it counterintuive. (d): row numbers of i that have a match to x. The use of!
suggests that the cases that have no match are of interest while it is actually the opposite.I propose to allow a character string in
which
with four possible values (other propositions are very welcome):c("xmatch", "xnomatch", "imatch", "inomatch")
where they correspond to (a), (b), (d), and (c) scenarios, respectively. These values would work as follow:So, the character string specified would allow to know the type of join (whether i needs to be prefixed with
!
or not) and the data.table whose row numbers should be returned.With this feature,
data.table:::[.data.table
would behave as below: