JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.76k stars 5.49k forks source link

[Dates] Extending the parsing/formatting machinery is awkward #29339

Open helgee opened 6 years ago

helgee commented 6 years ago

Extending the parsing machinery in Dates requires one to modify dictionaries such as CONVERSION_SPECIFIERS and to extend methods such as default_format.

Updating the dicts needs to happen within __init__, see e.g. https://github.com/JuliaTime/TimeZones.jl/issues/24

EDIT: Since default_format might need the updated dicts, it needs to be extended in __init__ as well. Thus requiring eval which then leads to https://github.com/JuliaLang/julia/issues/29059 (can I just ignore this if it works?) @eval is not needed but a precompiled DateFormat cannot be used.

All in all, the process is awkward and the end result not very pleasing, see here https://github.com/JuliaAstro/AstroTime.jl/blob/04a6ae917b9277e2abf77dfb199e487885db9595/src/AstroTime.jl#L22

Could this not be implemented through multiple dispatch alone? If not, what am I missing?

omus commented 6 years ago

When we did the parsing performance overhaul for Julia 0.6 we needed to use generated functions to address the performance issues. A side result of that is we needed to use dictionaries to still allow extensibility for packages like TimeZones. I'm not sure these restrictions are still the case with Julia 1.0.

helgee commented 6 years ago

Good to know! I plan to check whether it is still needed sometime this week.

JeffBezanson commented 5 years ago

Are there lots of potential extensions needed to date parsing, or is TimeZones the only example? If possible, it would be better for Dates to already know about all needed format characters, and handle them with 0-method functions.Then TimeZones.jl can add methods to that function when it's loaded.

omus commented 5 years ago

TimeZones is the only example I know of. It seems sensible to me to reserve the z and Z format characters.

helgee commented 5 years ago

The other example is AstroTime.jl. It uses D for the day-of-year format, e.g. AstroTime.format(now(), "yyyy-DDDTHH:MM:SS.sss") == "2019-45T08:19:53.529", and t for the time scale, e.g. AstroTime.format(now(), "yyyy-mm-dd HH:MM t") == "2019-02-14 08:21 UTC". The former should probably be upstreamed.

Just for my understanding: Would it not make sense to add a prefix to the character codes, e.g. strptime-style %d (apart from it being a breaking change)? This would make it easier to parse timestamps with additional text (see here) without preprocessing and the whole alphabet could be made available for future extension.

omus commented 5 years ago

I agree day-of-year should upstreamed. I just found the issue for it: https://github.com/JuliaLang/julia/issues/21905.

Using strptime character codes sounds reasonable to me. I believe there have been some proposals for formatted string printing and we'll probably want to have the dates formatting syntax be consistent.

omus commented 5 years ago

I recently discovered that the Unicode Technical Standard #35 contains a specification for date formatting and parsing symbols which works similarly to Julia's DateFormat.

Some particular things to note about this specification:

Unfortunately the specification has some incompatibilities with what is currently implemented in Dates. Time willing I'll try attempting the fully unicode specification as a separate package to try it out.

mikeingold commented 4 years ago

I have the same issue as in #21905. Ordinal (day of year) formatting is specified in ISO 8601 and commonly implemented in scientific/industrial datalogging equipment. I often need to parse data with dates specified in YYYY-DDD format (e.g. today would be 2020-079).

I don't have a use for any other functionality from AstroTime.jl, and there doesn't seem to be an elegant way to use its format parser to generate a regular Date that makes this job any simpler. My other options all seem sub-optimal and generally hack'y, like implementing a generic function:

OrdinalDate(year, doy) = Date( firstdayofyear(Date(year)) + Day(doy-1) )

Being able to directly construct an ordinal date, e.g. Date(year::Int, doy::Int), would be great but I don't know how we could make that unambiguous from the existing Date(year::Int, month::Int) signature.

Given the context of extracting data from logs, adding a symbol to DateFormat that enables calls like Date("2020079", DateFormat("yyyyD")) would be pretty much ideal.

helgee commented 4 years ago

@mikeingold For the time being, you could do this, which is not too inelegant IMHO:

using Dates
import AstroTime
Date(DateTime(AstroTime.UTCEpoch("2020079", DateFormat("yyyyD"))))

But I agree that having this in the stdlib would be better.

helgee commented 4 years ago

This just came up on Discourse: https://discourse.julialang.org/t/parsing-high-precision-timestamps/44061/1

TL;DR: All types using the built-in parser all limited to millisecond precision even Time.

matthieugomez commented 4 years ago

Another example is MonthlyDates, which uses q to parse quarters (i.e. 2020-Q3), see https://github.com/matthieugomez/MonthlyDates.jl/pull/7

helgee commented 2 years ago

I discovered another problem with the current approach recently: It does not work with parametric types.

If I put the UnionAll into CONVERSION_SPECIFIERS, e.g., Epoch{BarycentricDynamicalTime, T} where T, then it will not work for concrete types, e.g., Epoch{BarycentricDynamicalTime, Float64}, and vice versa.

ViralBShah commented 2 years ago

@quinnj Can we close this?

helgee commented 2 years ago

I am not @quinnj 😅but AFAICT all the issues that I have raised in this thread have not been addressed and remain to be problematic for downstream packages. So, no?

quinnj commented 2 years ago

I don't have much of an update, but I did take a stab at rewriting the Dates parsing code (not formatting yet) here. Notable changes include a DateFormat-like struct that doesn't specialize on specification characters (thus avoiding separate compilation for unique dateformat strings), and moving to a byte-buffer-based approach for parsing (which is what the entire Parsers.jl framework relies on).

It works well and passes all existings cases/tests that we have in Dates. The "extensions" part is still pretty clunky/awkward though. I admittedly didn't spend a ton of time trying to refine that API, since I was mainly interested in the performance and compilation gains and consistency with the rest of the Parsers.jl framework, but I did have the thought of revisiting it to try and iron out a more sensible extension system for custom TimeTypes. And subsequently see what it would look like to move that all into the Dates stdlib.

All that to say, yeah, I do think there is still awkwardness in extending Dates parsing/formatting and we should figure out a better system, but I haven't really done much about it yet, though I might in the future. And am happy to chat more with others who are also interested in figuring out a good extension system.