Open helgee opened 6 years ago
When we did the parsing performance overhaul for Julia 0.6 we needed to use generated functions to address the performance issues. A side result of that is we needed to use dictionaries to still allow extensibility for packages like TimeZones. I'm not sure these restrictions are still the case with Julia 1.0.
Good to know! I plan to check whether it is still needed sometime this week.
Are there lots of potential extensions needed to date parsing, or is TimeZones the only example? If possible, it would be better for Dates
to already know about all needed format characters, and handle them with 0-method functions.Then TimeZones.jl can add methods to that function when it's loaded.
TimeZones is the only example I know of. It seems sensible to me to reserve the z
and Z
format characters.
The other example is AstroTime.jl. It uses D
for the day-of-year format, e.g. AstroTime.format(now(), "yyyy-DDDTHH:MM:SS.sss") == "2019-45T08:19:53.529"
, and t
for the time scale, e.g. AstroTime.format(now(), "yyyy-mm-dd HH:MM t") == "2019-02-14 08:21 UTC"
. The former should probably be upstreamed.
Just for my understanding: Would it not make sense to add a prefix to the character codes, e.g. strptime
-style %d
(apart from it being a breaking change)? This would make it easier to parse timestamps with additional text (see here) without preprocessing and the whole alphabet could be made available for future extension.
I agree day-of-year should upstreamed. I just found the issue for it: https://github.com/JuliaLang/julia/issues/21905.
Using strptime
character codes sounds reasonable to me. I believe there have been some proposals for formatted string printing and we'll probably want to have the dates formatting syntax be consistent.
I recently discovered that the Unicode Technical Standard #35 contains a specification for date formatting and parsing symbols which works similarly to Julia's DateFormat
.
Some particular things to note about this specification:
dateformat"'SMAP_L4_SM_gph_'yyyymmdd'T'HHMMSS"
vs. dateformat"\S\MAP_L4_\S\M_gph_yyyymmddTHHMMSS"
D
S
field pattern specifies fractional seconds and not milliseconds which is helpful for showing additional precisionUnfortunately the specification has some incompatibilities with what is currently implemented in Dates
. Time willing I'll try attempting the fully unicode specification as a separate package to try it out.
I have the same issue as in #21905. Ordinal (day of year) formatting is specified in ISO 8601 and commonly implemented in scientific/industrial datalogging equipment. I often need to parse data with dates specified in YYYY-DDD
format (e.g. today would be 2020-079
).
I don't have a use for any other functionality from AstroTime.jl, and there doesn't seem to be an elegant way to use its format parser to generate a regular Date that makes this job any simpler. My other options all seem sub-optimal and generally hack'y, like implementing a generic function:
OrdinalDate(year, doy) = Date( firstdayofyear(Date(year)) + Day(doy-1) )
Being able to directly construct an ordinal date, e.g. Date(year::Int, doy::Int)
, would be great but I don't know how we could make that unambiguous from the existing Date(year::Int, month::Int)
signature.
Given the context of extracting data from logs, adding a symbol to DateFormat that enables calls like Date("2020079", DateFormat("yyyyD"))
would be pretty much ideal.
@mikeingold For the time being, you could do this, which is not too inelegant IMHO:
using Dates
import AstroTime
Date(DateTime(AstroTime.UTCEpoch("2020079", DateFormat("yyyyD"))))
But I agree that having this in the stdlib would be better.
This just came up on Discourse: https://discourse.julialang.org/t/parsing-high-precision-timestamps/44061/1
TL;DR: All types using the built-in parser all limited to millisecond precision even Time
.
Another example is MonthlyDates
, which uses q
to parse quarters (i.e. 2020-Q3
), see https://github.com/matthieugomez/MonthlyDates.jl/pull/7
I discovered another problem with the current approach recently: It does not work with parametric types.
If I put the UnionAll
into CONVERSION_SPECIFIERS
, e.g., Epoch{BarycentricDynamicalTime, T} where T
, then it will not work for concrete types, e.g., Epoch{BarycentricDynamicalTime, Float64}
, and vice versa.
@quinnj Can we close this?
I am not @quinnj 😅but AFAICT all the issues that I have raised in this thread have not been addressed and remain to be problematic for downstream packages. So, no?
I don't have much of an update, but I did take a stab at rewriting the Dates parsing code (not formatting yet) here. Notable changes include a DateFormat
-like struct that doesn't specialize on specification characters (thus avoiding separate compilation for unique dateformat strings), and moving to a byte-buffer-based approach for parsing (which is what the entire Parsers.jl framework relies on).
It works well and passes all existings cases/tests that we have in Dates. The "extensions" part is still pretty clunky/awkward though. I admittedly didn't spend a ton of time trying to refine that API, since I was mainly interested in the performance and compilation gains and consistency with the rest of the Parsers.jl framework, but I did have the thought of revisiting it to try and iron out a more sensible extension system for custom TimeType
s. And subsequently see what it would look like to move that all into the Dates stdlib.
All that to say, yeah, I do think there is still awkwardness in extending Dates parsing/formatting and we should figure out a better system, but I haven't really done much about it yet, though I might in the future. And am happy to chat more with others who are also interested in figuring out a good extension system.
Extending the parsing machinery in
Dates
requires one to modify dictionaries such asCONVERSION_SPECIFIERS
and to extend methods such asdefault_format
.Updating the dicts needs to happen within
__init__
, see e.g. https://github.com/JuliaTime/TimeZones.jl/issues/24EDIT:
Sincedefault_format
might need the updated dicts, it needs to be extended in__init__
as well. Thus requiringeval
which then leads to https://github.com/JuliaLang/julia/issues/29059 (can I just ignore this if it works?)@eval
is not needed but a precompiledDateFormat
cannot be used.All in all, the process is awkward and the end result not very pleasing, see here https://github.com/JuliaAstro/AstroTime.jl/blob/04a6ae917b9277e2abf77dfb199e487885db9595/src/AstroTime.jl#L22
Could this not be implemented through multiple dispatch alone? If not, what am I missing?