JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.75k stars 5.49k forks source link

DateFormat can't handle microseconds? #31525

Open vtjnash opened 5 years ago

vtjnash commented 5 years ago

While trying to parse our AV logs (https://ci.appveyor.com/api/projects/JuliaLang/julia/history?recordsNumber=1&startBuildId=20355614), I realized that we apparently can't handle times more precise than milliseconds (even though we have Dates.Microsecond):

julia> DateTime("2018-11-16T10:26:14.2858074Z", dateformat"YYYY-MM-ddTHH:MM:SS.sZ")
ERROR: InexactError: convert(Dates.Decimal3, 2858074)
Stacktrace:
 [1] tryparsenext at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/io.jl:119 [inlined]
 [2] tryparsenext at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/io.jl:41 [inlined]
 [3] macro expansion at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/parse.jl:64 [inlined]
 [4] tryparsenext_core(::String, ::Int64, ::Int64, ::DateFormat{Symbol("YYYY-MM-ddTHH:MM:SS.sZ"),Tuple{Dates.DatePart{'Y'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Date
s.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.Delim{Char,1}}}, ::Bool) at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/parse.jl:40
 [5] macro expansion at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/parse.jl:150 [inlined]
 [6] tryparsenext_internal at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/parse.jl:127 [inlined]
 [7] parse(::Type{DateTime}, ::String, ::DateFormat{Symbol("YYYY-MM-ddTHH:MM:SS.sZ"),Tuple{Dates.DatePart{'Y'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.Delim{Char,1}}}) at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/parse.jl:282
 [8] DateTime(::String, ::DateFormat{Symbol("YYYY-MM-ddTHH:MM:SS.sZ"),Tuple{Dates.DatePart{'Y'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.Delim{Char,1}}}) at /data/vtjnash/julia/usr/share/julia/stdlib/v1.2/Dates/src/io.jl:432
 [9] top-level scope at REPL[87]:1
tlienart commented 5 years ago

I think this discussion on discourse is related.

eulerkochy commented 5 years ago

@vtjnash although we have a Dates.Microsecond, the DateTime constructor doesn't use that as per the present implementation. We can obviously add the microsecond argument in the constructor, point is do we have a use case for the implementation?

JeffreySarnoff commented 5 years ago

with Int64 values and a proleptic Gregorian calendar

Using units of milliseconds covers  292_277_024 years.
Using units of microseconds covers      292_277 years.
Using units of nanoseconds covers           292 years.
Quafadas commented 3 years ago

@vtjnash although we have a Dates.Microsecond, the DateTime constructor doesn't use that as per the present implementation. We can obviously add the microsecond argument in the constructor, point is do we have a use case for the implementation?

An upvote for this issue - I have a JSON API which I'd like Julia to talk to and parse into structs - I'm new to the language - and settled on JSON3, HTTP as the apparent libraries of choice.

All works startlingly well except with DateTime, which won't accept microseconds. As I have no control over the API, and all the parsing is handled in the libraries, I actually don't know what the solution to this problem is except to leave the dates as String in the struct. My google-fu has failed me on other solutions ... but this No DateTime solution is distasteful, and I fear the extra pain down the road - to the extent that I will consider reverting to scala for this use case.

For me, this use case for the constructor is for someone like me, trying Julia. I'm sure a reasonably competent person expert has a workaround to this "microsecond API JSON problem" - a stupid one I grant you as I really don't care about the 4-6th digits.

However, I can't find a workaround, and it is scary to me as a newcomer, that managing a fairly simple third party API appears to be a problem beyond my skills - this is normally a trivial thing in modern languages.

If I understand correctly, the addition of the extra constructor makes that whole scary problem never appear...

I'll also post an issue in JSON3, to see if there is a known workaround.

Restated - a microsecond constructors eases DateTime interop with other languages.

david-macmahon commented 3 years ago

DateTime is limited to millisecond precision by design, but Time has nanosecond precision. It would be nice to be able use DateFormats with multiple s codes to specify the desired sub-second resolution for Time objects, but no matter how many s codes are given it seems to always behave as if three were given. This is a problem for both formatting and parsing:

julia> t=Time(1,2,3,456,789)
01:02:03.456789

julia> Dates.format(t, dateformat"HH:MM:SS.ssssss")
"01:02:03.456000"

julia> Time("01:02:03.456789", dateformat"HH:MM:SS.ssssss")
ERROR: InexactError: convert(Dates.Decimal3, 456789)
AndrewGibb commented 3 years ago

I also have a use case for this. We use nanosecond-precision timestamps for research into networked AV media. The data structures we use for debugging output microsecond precision times. It would be useful to parse these for analysis & visualisation in Julia.

notinaboat commented 3 years ago

I just spent quite a bit of time* trying to find out why my log timestamps are being rounded to the nearest ms. It turns out that the answer is Dates.format. As @david-macmahon mentions above, it silently accepts "ssssss" but rounds the Time value to ms before formatting it!

So, I whip up a microsecond formatting function, how hard can it be?

julia> formatus(t::Time) = string(Dates.format(t, "HH:MM:SS"), ".", lpad(microsecond(t), 6, "0"))

julia> formatus(Time(Nanosecond(1_000_000_000)))
"00:00:01.000000"

julia> formatus(Time(Nanosecond(1_000_123_000)))
"00:00:01.000123"

... so far so good, but then:

julia> formatus(Time(Nanosecond(1_123_123_000)))
"00:00:01.000123"

The milliseconds part disappeared!

The way Dates.Time handles sub-second values turns out to be a bit surprising.

You can put ns in, and get ns back out again...

julia> Time(Nanosecond(5))
00:00:00.000000005

julia> nanosecond(Time(Nanosecond(5)))
5

Except that if you put in more than 999 ns you don't get all of them back...

julia> Time(Nanosecond(5005))
00:00:00.000005005

julia> nanosecond(Time(Nanosecond(5005)))
5

With μs you're not allowed to use values over 999.

julia> Time(Microsecond(5))
00:00:00.000005

julia> Time(Microsecond(5005))
ERROR: ArgumentError: Microsecond: 5005 out of range (0:999)

Is Dates.Time designed this way on purpose? nanosecond(Time(Nanosecond(5005))) == 5 seems like a bug, but this test case makes it look like that is intended:

    dt = Dates.DateTime(2000, 1, 1, 23, 59, 59, 50)
    t = Dates.Time(dt)
    @test Dates.hour(t) == 23
    @test Dates.minute(t) == 59
    @test Dates.second(t) == 59
    @test Dates.millisecond(t) == 50
    @test Dates.microsecond(t) == 0
    @test Dates.nanosecond(t) == 0

Is there a use-case where this is desirable? I get that DateTime only has ms resolution, but 50ms should be === 50_000 μs irrespective of the underlying resolution.

It would seem less surprising to have:

    @test Dates.microsecond(t) == 50_000
    @test Dates.nanosecond(t) == 50_000_000

*(My log messages originate on an 8-bit micro, measuring a physical system with ~ μs resolution. They are passed to another 8-bit micro over SPI, then through a serial port to a raspberry Pi running Julia, then through WiFi to Julia on a mac, then through an RPC link to a log monitoring GUI, I checked everywhere else before suspecting that Dates.format might be broken)

JeffreySarnoff commented 3 years ago

Dates is carefully written (it is not broken). For more specifics on periods, please see the docs for Dates. That said, it is well known that providing direct support for more complete subsecond handling would be good. Dates (including Times and DateTimes) operate with 64 bit underlying values. That precludes some of the widest, broadest lossless interconversions. And yes, in some ways Nanoseconds are special.

notinaboat commented 3 years ago

The documentation for microsecond and nanosecond says just:

The microsecond of a Time as an Int64. The nanosecond of a Time as an Int64.

There is no mention that the return values are modulo 1000 and modulo 1000000. Nor is there any discussion in the documentation about why it might be useful for microsecond(t) to return microseconds modulo 1000.

Time type is not mentioned at all in the main body of the documentation. It only appears in the "API Reference" section, where all it says is:

Time wraps a Nanosecond and represents a specific moment in a 24-hour day.

To me, given that documentation, this seems broken:

nanosecond(Time(Nanosecond(5005))) == 5

anowacki commented 3 years ago

@notinaboat I have previously had the same problem. The workaround in your case is:

julia> using Printf: @sprintf

julia> formatus(t::Time) = @sprintf("%s.%03d%03d", Dates.format(t, "HH:MM:SS"), millisecond(t), microsecond(t))
formatus (generic function with 1 method)

julia> formatus(Time(Nanosecond(1_234_567_890)))
"00:00:01.234567"

This should probably be in a new issue, but…

To solve the documentation issue of what millisecond et al. return without breaking the current API, I suggest the following wording for all the millisecond, microsecond and nanosecond docstrings, using the example of millisecond:

    millisecond(t::Time) -> Int64

The millisecond part of a `Time` as an `Int64`, lying in the range 0 to 999 ms.

# Example
```jldoctest
julia> t = Time(12, 34, 56, 123, 456, 789)
12:34:56.123456789

julia> millisecond(t)
123
```[ignore this—extra stuff to allow for inclusion of the backticks]

See [`Time`](@ref) for more on the parts making up `Time`.

For Time, I suggest removing the mention of it being backed by Nanoseconds and instead say:

    Time

`Time` represents a specific moment in a 24-hour day.  A `Time` type consists
of parts representing hours (`h`, up to 23), minutes (`mi`, up to 59)`,
seconds (`s`, up to 59), milliseconds (`ms`, up to 999), microseconds (`us`, up to 999)
and nanoseconds (`ns`, up to 999).

Or much of the above could instead go into the second docstring shown, which currently is:

    Time(h, [mi, s, ms, us, ns]) -> Time

Construct a `Time` by parts.  Arguments must be convertible to `Int64`.

Happy to make a PR if this sounds sensible.

None of the above actually addresses the issue at hand (that DateFormat doesn't cope with the resolution supported by Times).

anowacki commented 3 years ago

The implementation issue which is confusing is that Times are just wrappers around a Nanosecond, and are most basically constructed as you did with e.g. Time(Nanosecond(1_000_000_000)). But this means there is no check for the value being over 999 as for Millisecond and Microsecond. (Specifically, Dates.validargs(Time, ...) is never called.)

Perhaps Nanosecond <: TimePeriod should be distinct from a NanosecondInstant <: Instant used for Time?

[Edited to refer correctly to TimePeriod rather than Period which is for dates.]

JeffreySarnoff commented 3 years ago

@anowacki helpful workaround.

regarding _this issue: DateFormat doesn't handle the full resolution of Times -- The Period used to wrap Time onto an Int64 value is Nanosecond while the Period used to wrap DateTime onto an Int64 value is Millisecond; so some fine grained resolution becomes lossy when mixing the two.

just for fun

nbits(x) = iszero(x) ? 0 : ceil(Int, log2(1+abs(x)));
nbits(x::Dates.AbstractTime) = nbits(value(x));
value(x::Time) = x.instant.value;
value(x::DateTime) = x.instant.periods.value;
nanos_per_millisec= 1_000_000;
nbits(typemax(DateTime)), nbits(typemax(Time)),  nbits(nanos_per_millisec)
# 62, 47, 20
JeffreySarnoff commented 3 years ago

Nanosecond does not limit values to 999 because the finest resolved unit of time (the smallest Period) must be available as an Int64 valued destination to keep internal Time related functions very fast.

anowacki commented 3 years ago

Nanosecond does not limit values to 999 because the finest resolved unit of time (the smallest Period) must be available as an Int64 valued destination to keep internal Time related functions very fast.

Indeed—the issue is not with the constructors of any of the TimePeriods, all of which accept an arbitrary Int64 quite rightly.

Instead, the issue is because of the Time constructors, two of which are important here:

When you do Time(Millisecond(999)) and get a Time(0, 0, 0, 999), you are calling the second of the two above constructors (the API one). This in turn calls Dates.validargs to check that all of the individual TimePeriods supplied are in the correct range. (You could well argue that this is unnecessary—why not allow a Time adjusted from midnight by 1000 ms rather than only 999 ms? After all, you can do Time(Millisecond(999), Millisecond(1))!) Arguably multiple arguments of the same type should not be allowed…

However, when you do Time(Nanosecond(1000)), the first (internal) method is called, and Dates.validargs elided. This seems unintentional to me—presumably the second method above was intended to catch any combination of TimePeriods.

Perhaps a resolution to all of this is something like:

struct TimeInstant{P<:TimePeriod} <: Instant
    periods::P
end

struct Time <: TimeType
    instant::Nanosecond
    Time(instant::TimeInstant{Nanosecond}) = new(mod(instant.periods, 86400000000000))
end

and then call the inner constructor by wrapping Nanosecond in TimeInstant?

notinaboat commented 3 years ago

@anowacki thanks for posting the sprintf example.

@JeffreySarnoff I see that Date and Time are stored with different resolution, but none of my issues or concerns have anything to do with Date. My use case involves just Time. Time has ns resolution, which is enough (μs would also be fine in my case). The issue is with the interface.

Its seems there are two fundamental problems here.

First, implementation detail is leaking out and polluting the generic interface as @anowacki describes above.

Second, the current interface implies the unusual idea that ms, μs and ns should be treated as seperate "columns" (i.e. that each of these values is limited to three decimal digits). It is natural to think of hours, minutes and seconds as seperate "columns", but every other time interface that I'm familiar with treats fractional seconds simply as fractional seconds. ISO 8601 allows seconds.xxxx... with as much precision as you like. Unix has struct timeval with a tv_usec field. Go, rust and swift, all return nanoseconds when you ask for nanoseconds. Rust allows you to specify wether you want the whole time value in ns or just the sub-second fraction. This seems nice.

The decision to return nanoseconds modulo 1000 by default seems very unusual. I can't think of any application where it is more convenient to have fractional seconds divided into seperate 3-digit columns by default.

As it stands if I want to get the fractional seconds part of a Julia Time I have to type 1000000 * millisecond(t) + 1000 * microsecond(t) + nanosecond(t).

e.g.

go has Microseconds and Nanoseconds accessors that do the expected thing:

t := time.Duration(10_123_456_789)
fmt.Println(t.Microseconds())
10123456

fmt.Println(t.Nanoseconds())
10123456789

Rust lets you ask for as_micros or subsec_micros:

let t = Duration::from_nanos(10_123_456_789);
println!("{:?}", t);
10.123456789s

println!("{:?}", t.as_micros());
10123456

println!("{:?}", t.as_nanos());
10123456789

println!("{:?}", t.subsec_micros());
123456

println!("{:?}", t.subsec_nanos());
123456789

Swift has only one sub-second accessor: nanosecond.

var t = DateComponents.init(nanosecond:10_123_456_789)
print("\(t.nanosecond!)")
10123456789
JeffreySarnoff commented 3 years ago

@notinaboat Thank you for the clarification and specificity of your notes. Although I have not thought about them for a year or two, I am aware of these issues and the preference for uniformity.

It is possible to experiment with a modified Dates as an external package [say, AltDates.jl] that begins as a duplicate of Dates from https://github.com/JuliaLang/julia/tree/master/stdlib/Dates. For that, a deep dive into Dates is required -- many of the unobvious decisions are guided by the need for unflappable performance, others by designing in calculations that are free from error, and there are aspects of the design that facilitate higher level operations.

Experimenting with the least change that provides the most satisfaction is recommended, and benchmarking the use of Dates and AltDates with their tests and with e.g. DataFrames and TimeSeries matters.

jebej commented 3 years ago

@notinaboat I agree that the accessor function name is somewhat misleading. Maybe we could add functions to obtain the value of a Period as whatever unit is desired, but I don't know how to avoid ambiguities with the current names.

What you want is done with Dates.value. Note that the unit of the integer you obtain depends on the type of the Period. For a Time type, you get back the nanoseconds:

julia> Time(13,11,35,621,355,112) |> nanosecond # get out only the nanosecond part
112

julia> Time(13,11,35,621,355,112) |> Dates.value # get the raw integer, here in nanoseconds
47495621355112
notinaboat commented 3 years ago

That's interesting @jebej.

Dates.value is documented as Dates.value(x::Period) -> Int64, but Time <: Period is false.

In my own code, I've decided to use Float64 seconds everywhere instead of Time and only convert to Time just before Dates.format. The Dates interface as it stands clearly has both internal inconsistencies and surprising inconsistencies with other similar interfaces. That makes me think it is likely to change in Julia 2.0 (hopefully?), so for now I'm going to avoid relying on its quirks as much as possible.

@JeffreySarnoff, with regard to alternative time packages, I've found one that looks serious https://github.com/FugroRoames/Chrono.jl. However I don't have time available to work on time packages.

I am spending a little time posting here in the hope that whoever next works on Dates has some documentation of problems experienced by a real user.

FWIW, a few suggested changes focused on the principle of least surprise...

New method:

julia> Dates.Nanosecond(t::Time) = t.instant

julia> Nanosecond(Time(Nanosecond(1000))) 1000 nanoseconds

julia> Nanosecond(Time(Nanosecond(10_123_456_789))) 10123456789 nanoseconds



 - After deprecations have been in place for long enough, add:
   - `Time(x::TimePeriod) = Time(Nanosecond(tons(x)))`
   - `nanosecond(t::Time) = value(t) % Int(1e9)`
   - `Time(ns::Integer) = Time(Nanosecond(ns))`

 - Add `Time(seconds::AbstractFloat) = Time(Nanosecond(round(Int64, 1e9 * seconds)))` 
JeffreySarnoff commented 3 years ago

I was not suggesting that you rework Dates -- my notes are for others who will find this at some later date. For all things carefully timed (overkill for your use) https://github.com/JuliaAstro/AstroTime.jl [nonetheless if you need to check results, this is reliable]. It is well understood by the community of people who write time libraries that using Floats is not safe and may introduce subtle, and difficult to track down (or to recognize) errors. That said, nobody understands your uses better than you.

Here is a thread from the early days of Julia https://julia-dev.narkive.com/c21H1xHJ/the-path-to-a-time-type and another discussing higher resolution for times https://discourse.julialang.org/t/higher-resolution-datetime-timestamp/3069/28