Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 985 forks source link

fread could parse timestamps requested as nanotime with dec=',' if they can first parse as POSIXct #6500

Open MichaelChirico opened 2 months ago

MichaelChirico commented 2 months ago
fread(text="t\n2023-10-12T06:53:53.123Z", colClasses="nanotime") # works
fread(text="t\n2023-10-12T06:53:53,123Z", colClasses="nanotime", sep=";") # does not

Originally posted by @ben-schwen in https://github.com/Rdatatable/data.table/issues/6445#issuecomment-2352495080

https://github.com/Rdatatable/data.table/pull/6445#issuecomment-2352910879

MichaelChirico commented 2 months ago

This won't work as a general solution for {nanotime}, though, since we require first that the timestamp fits in POSIXct precision:

xp=as.POSIXct("2024-01-01 01:02:03.45678987654", "%Y-%m-%d %H:%M:%OS")
xn=as.nanotime("2024-01-01T01:02:03.45678987654Z")
all.equal(xn, as.nanotime(xp), tolerance=0)
# [1] "Mean relative difference: 5.516202e-17"

i.e., for string in file with too much precision, parsing first as POSIXct may eliminate precision:

fread("t\n2024-01-01T01:02:03.45678987654Z")
#                      t
#                 <POSc>
# 1: 2024-01-01 01:02:03

With that in mind I think we're best off not doing anything here -- better to require the user to change , to . in R and then parse nanotime, then to delete precision (which can't be recovered).