Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 986 forks source link

as.Date can result in different underlying types #6602

Open ben-schwen opened 4 days ago

ben-schwen commented 4 days ago

R CMD check is currently failing on windows-devel with typeof x.START_DATE (integer) != typeof i.DATE (double)

The responsible code is explicitly casting to as.Date, so I guess this a bug in R-devel?

Seems related to this recent change in R-devel: https://bugs.r-project.org/show_bug.cgi?id=18782

MichaelChirico commented 4 days ago

Could you link the data.table code since you found it?

Id say yes it's likely related to that Bugzilla report + associated fix. Seems R-core want to move more towards greedily allowing Date type to have underlying integer storage sometimes, not sure what exactly we should do in light of that.

ben-schwen commented 4 days ago

It is test case 1848.1 https://github.com/Rdatatable/data.table/blob/6a15f8617de121a406cee97b22e83e0c2c4bb034/inst/tests/tests.Rraw#L12219-L12233

As you mentioned, with R-devel r87285 on Windows we have typeof(seq(as.Date('2015-01-01'), as.Date('2015-01-05'), by="day")) returning integer while the previous behavior was that it always returned double.

The strange part is that typeof(DT1$DATE) and typeof(DT2$START_DATE) both return "integer" on R-devel r87285, so bmerge should work.

ben-schwen commented 4 days ago

Update: Turning on verbose tells us the problem here. We do a join where a single column, is in two join conditions.

i.RANDOM_STRING has same type (character) as x.RANDOM_STRING. No coercion needed.
i.DATE has same type (integer) as x.START_DATE. No coercion needed.
Coercing integer column i.DATE to type double for join to match type of x.EXPIRY_DATE.

Minimal reprex:

date_int = seq(as.Date('2015-01-01'), as.Date('2015-01-01'), by="day")
x = data.table(a=date_int, b=1)
y = data.table(c=date_int, d=as.Date('2015-01-01'))
y[x, on=.(c == a, d == a)]