JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.73k stars 5.49k forks source link

[Dates] Time parsing is limited to millisecond precision #37579

Open helgee opened 4 years ago

helgee commented 4 years ago

I originally added this as a comment to #29339 but it deserves its own issue.

The parser in Dates is limited to parsing milliseconds even though Time and data types in other packages, e.g. AstroTime.jl, support higher precision.

julia> t = Time(12, 12, 12, 123, 456, 789)
12:12:12.123456789

julia> time_str = string(t)
"12:12:12.123456789"

julia> Time(time_str)
ERROR: InexactError: convert(Dates.Decimal3, 123456789)
Stacktrace:
 [1] tryparsenext at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/io.jl:153 [inlined]
 [2] tryparsenext at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/io.jl:41 [inlined]
 [3] macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:64 [inlined]
 [4] tryparsenext_core(::String, ::Int64, ::Int64, ::DateFormat{Symbol("HH:MM:SS.s"),Tuple{Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'}}}, ::Bool) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:38
 [5] macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:150 [inlined]
 [6] tryparsenext_internal at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:125 [inlined]
 [7] parse at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:282 [inlined]
 [8] Time at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Dates/src/io.jl:524 [inlined] (repeats 2 times)
 [9] top-level scope at REPL[9]:1
JeffreySarnoff commented 4 years ago

seconded

helgee commented 4 years ago

In the current implementation this issue occurs by design

@inline function tryparsenext(d::DatePart{'s'}, str, i, len)
    val = tryparsenext_base10(str, i, len, min_width(d), max_width(d))
    val === nothing && return nothing
    ms0, ii = val
    len = ii - i
# Here the parser bails when there are more than three decimal places
    if len > 3
        ms, r = divrem(ms0, Int64(10) ^ (len - 3))
        r == 0 || throw(InexactError(:convert, Decimal3, ms0))
    else
        ms = ms0 * Int64(10) ^ (3 - len)
    end
    return ms, ii
end

IIUC the problem is that tryparsenext cannot return multiple date parts. Is that correct? If so, I see two possible ways to fix this:

  1. Introduce new character codes for micro- and nanoseconds, e.g. μ and n. This would fix the problem in Base but not for packages that want to support even higher precision, e.g. AstroTime.jl.
  2. Parse all digits and let the constructors handle them which is also quite awkward.

Any other ideas?

JeffreySarnoff commented 4 years ago

I have done some experimenting. The most straightforward solution is to use Int128s where we now use Int64s. The use of Int128s with arithmetic (including e.g. fldmod and muladd) seems to slow down the processing by 1.4x (*, muladd with fma) .. 2.5x (fldmod) .. 3.3x (divrem, rem, fld). I did not check generating the DateTime values.

Using a pair of Int64s does not gain much in throughput vs using Int128 and it adds a good deal of additional source code. Nonetheless, keeping DateTime as it is now for the more significant part and using another Int64 for sub-millisecond measure, we would support attosecond resolution.

And it would allow us the choice of either making DateTime and Time work with higher resolution units or keeping DateTime and Time as they are now while introducing, say, HiresDateTime and HiresTime. I prefer the former.

helgee commented 3 years ago

I have implemented a workaround for AstroTime.jl. See here: https://github.com/JuliaAstro/AstroTime.jl/pull/58/commits/7cf0ba45962e8c9921356d8149198fcb7e27eebe

It defines the following parser:

@inline function Dates.tryparsenext(d::Dates.DatePart{'f'}, str, i, len, locale)
    next = Dates.tryparsenext_base10(str, i, len)
    next === nothing && return nothing
    val, idx = next
    val /= 10^(idx - i)
    return val, idx
end
JeffreySarnoff commented 3 years ago

Would you post a few examples of it doing what it does? (assume using Dates, AstroTime has been done). Thank you.

helgee commented 3 years ago

Sure!

AstroTime with this patch

julia> using AstroTime; import Dates

julia> str = "2021-03-27T17:25:13.123456789"
"2021-03-27T17:25:13.123456789"

julia> df = Dates.dateformat"yyyy-mm-ddTHH:MM:SS.fffffffff"
dateformat"yyyy-mm-ddTHH:MM:SS.fffffffff"

julia> TAIEpoch(str, df)
2021-03-27T17:25:13.123 TAI

vs. without

julia> df1 = Dates.dateformat"yyyy-mm-ddTHH:MM:SS.sssssssss"
dateformat"yyyy-mm-ddTHH:MM:SS.sssssssss"

julia> TAIEpoch(str, df1)
ERROR: InexactError: convert(Dates.Decimal3, 123456789)
Stacktrace:
 [1] tryparsenext
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/io.jl:153 [inlined]
 [2] tryparsenext
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/io.jl:41 [inlined]
 [3] macro expansion
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/parse.jl:64 [inlined]
 [4] tryparsenext_core(str::String, pos::Int64, len::Int64, df::Dates.DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.sssssssss"), Tuple{Dates.DatePart{'y'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'H'}, Dates.Delim{Char, 1}, Dates.DatePart{'M'}, Dates.Delim{Char, 1}, Dates.DatePart{'S'}, Dates.Delim{Char, 1}, Dates.DatePart{'s'}}}, raise::Bool)
   @ Dates /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/parse.jl:38
 [5] macro expansion
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/parse.jl:150 [inlined]
 [6] tryparsenext_internal
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/parse.jl:125 [inlined]
 [7] parse
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/parse.jl:282 [inlined]
 [8] (TAIEpoch{T} where T)(str::String, format::Dates.DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.sssssssss"), Tuple{Dates.DatePart{'y'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'H'}, Dates.Delim{Char, 1}, Dates.DatePart{'M'}, Dates.Delim{Char, 1}, Dates.DatePart{'S'}, Dates.Delim{Char, 1}, Dates.DatePart{'s'}}})
   @ AstroTime.Epochs ~/projects/julia/AstroTime/src/Epochs/dates.jl:68
 [9] top-level scope
   @ REPL[15]:1
JeffreySarnoff commented 3 years ago

step 1 nicely done do you want submillisecond resolution for TAI and other timebases?

helgee commented 3 years ago

do you want submillisecond resolution for TAI and other timebases?

Yes. Very important for deep space navigation: https://github.com/JuliaAstro/AstroTime.jl/issues/28#issue-326825392 And I do hope that somebody will use the library for that purpose eventually 😅

helgee commented 3 years ago

Epochs/DateTimes in AstroTime.jl are represented like this:

struct Epoch{S<:TimeScale, T} <: Dates.AbstractDateTime
    scale::S
    second::Int64
    fraction::T
    function Epoch{S}(second::Int64, fraction::T) where {S<:TimeScale, T}
        return new{S, T}(S(), second, fraction)
    end
end

ususally with T == Float64. Thus, I want to be able to parse all decimal places of the current second into fraction.

Roger-luo commented 3 years ago

I hit this too, my use case is when I wrap some AWS services, they provide microsecond precision in their date time, which I believe comes from Python because the default unit of Python datetime is microsecond so does IBM Quantum REST service, and working with them currently is not very pleasant since I need an extra step to remove the extra precision.

After some discussion with @ericphanson on slack, we think a fix to this could be parameterize the unit, and parse the string to highest available precision, e.g

"2021-01-28T00:21:11.849352123"

will be parsed to nanosecond precision

"2021-01-28T00:21:11.849352"

will be parse to microsecond precision,

"2021-01-28T00:21:11.849"

will be parsed to milisecond precision, in this way we could fix the problem without breaking existing code. But I think will need someone to take a second look on this, since I'm not faimilar with the implementation of Dates.

also, cc: @omus

omus commented 3 years ago

I think parameterizing DateTime with a limited set of precision types is reasonable. Adding a type parameter will definitely be a breaking change though.

ericphanson commented 2 years ago

To be clear, since Time already supports nanoseconds, I think there's no API issue or worry of breaking changes for supporting parsing times with nanosecond precision. I think the issue around that is for DateTimes (which only support microseconds), but that should probably be a separate issue, in my opinion.

JeffreySarnoff commented 2 years ago

Yes, that is a major issue as it is not just about parsing: one ought represent the result of parsing to precision parsed.

ufechner7 commented 1 year ago

What is the status of this issue?

JeffreySarnoff commented 1 year ago

This issue has not advanced. If suitable, consider NanoDates.jl. That has worked for others with similar needs.