Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.57k stars 974 forks source link

type coercion failure despite explicit as.POSIXct #2818

Open MichaelChirico opened 6 years ago

MichaelChirico commented 6 years ago
DT = data.table(d = c('Thu 01 Jan 1970, 07:30:00 AM',
                      'Thu 01 Jan 1970, 08:00:00 AM'),
                tz = c('Asia/Singapore', 'Asia/Ho_Chi_Minh'))
DT[ , d := as.POSIXct(d, format = '%a %d %b %Y, %r', tz = .BY$tz), by = tz]

Maybe it's the interaction with by that's causing the failure? I'm not sure....

franknarf1 commented 5 years ago

Is the idea that j = `:=`(existing_col, as.anything(arbitrary_code)) should override protecting the existing column's type and attributes or that by= shouldn't trigger such protection?

One possible UI for disabling all protections so the user can freely plonk would be a shortcut in i:

DT2 = data.table(g = rep(1:2, each=2), d = "2019-01-01", x = 3:6)

DT2[.UNLOCK, d := as.IDate(d), by=g] # despite by= and coercion
DT2[.UNLOCK, x := 7:8 ]              # despite recycling

I don't think I would use it, and guess the recycling case would almost always be a user error.


A couple workarounds:

DT[ , d := .SD[, as.POSIXct(d, format = '%a %d %b %Y, %r', tz = .BY$tz), by = tz]$V1 ]

The other workaround, which I use in a helper function for date parsing, is juggling columns (like tmp = as.POSIXct, d = tmp, tmp = NULL).

MichaelChirico commented 5 years ago

Is the idea that j = :=(existing_col, as.anything(arbitrary_code)) should override protecting the existing column's type and attributes or that by= shouldn't trigger such protection?

I'm not quite sure the practical distinction...

but it seems to me that as.type even by-group is guaranteed to return type columns, so plonk should be OK.

franknarf1 commented 5 years ago

@MichaelChirico I meant that it was unclear to me whether you were suggesting

I guess the exception makes sense (and ditto your exception for i=order(...) #2925), but makes explaining the behavior to users more complicated. Also, it may invite further changes for other unambiguous cases like paste/sprintf/format (guaranteed to return char).