ariebovenberg / whenever

⏰ Modern datetime library for Python
https://whenever.rtfd.io
MIT License
889 stars 15 forks source link

Support ISO8601 periods #55

Closed kinow closed 8 months ago

kinow commented 9 months ago

Hi,

Thanks for the nice article.

I work on a workflow manager used in weather & climate. Previously I worked on another workflow manager used for cyclic workflows. In that workflow manager, the workflow definition uses ISO8601 periods, like P1Y2M for one year and two months, and PT1M for one minute.

The library used to handle it there is isodatetime, maintained by the UK Met Office (I'd be interested to see how well that library performs in the datetime-pitfalls article). Here's how isodatetime handles ISO8601 time intervals/periods..

In [1]: import metomi.isodatetime.parsers as parse

In [2]: parse.DurationParser().parse('P1Y1M')
Out[2]: <metomi.isodatetime.data.Duration: P1Y1M>

In [3]: parse.DurationParser().parse('P1Y1M').get_days_and_seconds()
Out[3]: (395.0, 0.0)

Pendulum supports it too,

In [4]: import pendulum

In [5]: pendulum.parse('P1Y1M')
Out[5]: Duration(years=1, months=1)

In [6]: d.total_days()
Out[6]: 395.0

However, while metomi-isodatetime handles the ISO8601 recurring intervals,

In [7]: date_time = parse.TimePointParser().parse('1984-01-01')

In [8]: parse.TimeRecurrenceParser().parse('R/1984/P1Y').get_next(date_time)
Out[8]: <metomi.isodatetime.data.TimePoint: 1985-01-01T00:00:00+01:00>

Pendulum doesn't support it

In [9]: pendulum.parse('R/1984/P1Y')
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
Cell In[9], line 1
----> 1 pendulum.parse('R/1984/P1Y')

File /tmp/venv/lib/python3.10/site-packages/pendulum/parser.py:30, in parse(text, **options)
     26 def parse(text: str, **options: t.Any) -> Date | Time | DateTime | Duration:
     27     # Use the mock now value if it exists
     28     options["now"] = options.get("now")
---> 30     return _parse(text, **options)

File /tmp/venv/lib/python3.10/site-packages/pendulum/parser.py:43, in _parse(text, **options)
     40 if text == "now":
     41     return pendulum.now()
---> 43 parsed = base_parse(text, **options)
     45 if isinstance(parsed, datetime.datetime):
     46     return pendulum.datetime(
     47         parsed.year,
     48         parsed.month,
   (...)
     54         tz=parsed.tzinfo or options.get("tz", UTC),
     55     )

File /tmp/venv/lib/python3.10/site-packages/pendulum/parsing/__init__.py:78, in parse(text, **options)
     75 _options: dict[str, Any] = copy.copy(DEFAULT_OPTIONS)
     76 _options.update(options)
---> 78 return _normalize(_parse(text, **_options), **_options)

File /tmp/venv/lib/python3.10/site-packages/pendulum/parsing/__init__.py:125, in _parse(text, **options)
    121 # We couldn't parse the string
    122 # so we fallback on the dateutil parser
    123 # If not strict
    124 if options.get("strict", True):
--> 125     raise ParserError(f"Unable to parse string [{text}]")
    127 try:
    128     dt = parser.parse(
    129         text, dayfirst=options["day_first"], yearfirst=options["year_first"]
    130     )

ParserError: Unable to parse string [R/1984/P1Y]

I installed whenever in my test venv, but reading the docs and API I couldn't find a way to parse these kind of expressions. I guess it's not supported? Any plans to support that?

These expressions are very useful in cyclic workflows, and in climate & weather research, as a way to support solving problems like "iterate over some data weekly, taking into account timezones/dst", or "get the next date for a 6-hours daily forecast workflow task, skipping weekends", or "get the next first day of the second week of a month in a leap calendar".

In cycling workflows, it's also extremely common to have a "calendar", e.g. gregorian, 360-days (12 months of 30 days), 365-days (no leaps years), 366-days (always a leap year), etc.. But this is more research-oriented, and I am not sure if there are other libraries that allow for that (even though it might be common in other fields outside earth-sciences). But FWIW, here's how it's done in Met Office's library (same example used at the top of this description, note the number of days):

In [1]: import metomi.isodatetime.parsers as parse
   ...: 

In [2]: from metomi.isodatetime import data

In [3]: data.CALENDAR.set_mode("360day")
   ...: 

In [4]: parse.DurationParser().parse('P1Y1M')
Out[4]: <metomi.isodatetime.data.Duration: P1Y1M>

In [5]: parse.DurationParser().parse('P1Y1M').get_days_and_seconds()
Out[5]: (390.0, 0.0)

Cheers,

p.s. the reason for the 360-days calendar, for example, “is analytical convenience in creating seasonal, annual and multi-annual means which are an integral part of climate model development and evaluation.” 10.31223/X5M081 (author works in NIWA-NZ, where Cylc was created... Cylc uses metomi-isodatetime :+1:, but the same approach is common everywhere climate models are executed, Australia, NZ, Canada, Brazil, USA, UK, here in Spain where we use some custom datetime code, Japan, etc.)

ariebovenberg commented 9 months ago

Hi @kinow , thanks for taking the time for this well-researched feature request. To get to an well-rounded answer, I'll need some time to dig into the resources you mention.

What I can tell you is that there will be a basic ISO8601-like Period class. (see also #27 ) in the next release. Its semantics will probably look a lot like NodaTime and Temporal which both behave similarly to RFC5545 (iCal).

kinow commented 9 months ago

Great @ariebovenberg !

We might review our current datetime/custom date library and then we might have to choose a new date/time library. If there's anything similar to recurring periods, or maybe just the Period class that is timezone-aware that could be sufficient for me to create an iterator to behave like ISO8601's recurring time intervals.

Thanks!

ariebovenberg commented 9 months ago

One thing that would be helpful: can check how your needs coincide with (or differ from) RFC 5545's semantics on recurring events? I must say I haven't fully read it myself yet, but I'll probably support its semantics by default.

When it comes to ISO8061 itself, it is unfortunately paywalled so I'll have to do some better digging.

kinow commented 9 months ago

Interesting, reading “3.2.13. Recurrence Identifier Range ” is looks like this matches the concepts we have in cyclic workflow managers. Like an Outlook/Thunderbird calendar, where you have a person's agenda task recurrent to happen every Monday at 8AM, for example, in a cyclic workflow you can have a workflow automated task that runs the weather workflow every Monday at 8AM too.

The parameter value can only be "THISANDFUTURE" to indicate a range defined by the recurrence identifier and all subsequent instances. The value "THISANDPRIOR" is deprecated by this revision of iCalendar and MUST NOT be generated by applications.

Not sure if this means the series can only go forward, and not backward. Ideally we would be able to walk from a certain date toward the future, or backward to the past, until it reaches a final date (or not, just keeps running ad infinitum).

It looks like RFC 5545 and ISO 8601 have the common part of creating recurring time series, but RDF 5545 also handles scheduling, updating events, events that collide with each other, etc. Things useful for calendars and task scheduling, which could be interesting for workflow managers.

I found a useful online validator: https://icalendar.org/validator.html

After a quick read of the RFC, I tried to replicate my ISO8601 example, and I think I found the first road bump. You can copy and paste this into the validator above:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//hacksw/handcal//NONSGML v1.0//EN
BEGIN:VEVENT
UID:19970610T172345Z-AF23B2@example.com
DTSTAMP:19840101T000000Z
DTSTART:19840101T000000Z
DURATION:P1Y
SUMMARY:Workflow task recurrence definition
END:VEVENT
END:VCALENDAR

I was trying to get a yearly recurring interval from 1984 (i.e. a series with 1984, 1985, 1986, ...., ∞). But it failed due to the DURATION:P1Y. The validator explains clearly what went wrong:

Invalid DURATION value (invalid 'Y' value) [near line # 4](javascript:void(0);)Reference: [3.8.2.5. Duration](http://icalendar.org/iCalendar-RFC-5545/3-8-2-5-duration.html), [3.3.6. Duration](http://icalendar.org/iCalendar-RFC-5545/3-3-6-duration.html)

It works if I replace the P1Y with the hours in a year, PT8760H. Reading the RFC 5545, it looks like it doesn't support year, month, day in the period definition.

Reading the RFC again, in “3.3.6. Duration” I see this note:

Value Name: DURATION … … (…)Note that unlike ISO.8601.2004, this value type doesn't support the "Y" and "M" designators to specify durations in terms of years and months.

So RFC-5545's Duration looks a bit restrictive for me, as if we adopted it in a workflow manager, we would have to create an extra layer between users and the workflow engine, to translate configuration values like P1Y into PT8760H.

kinow commented 9 months ago

When it comes to ISO8061 itself, it is unfortunately paywalled so I'll have to do some better digging.

That's true. I wish there was an open standard instead. Maybe this version from archive.org could be useful? https://archive.org/details/iso-tc154-wg5_n0038_iso_wd_8601-1_2016-02-16/page/2/mode/2up

There's a 2019 version with some extensions that were being discussed in isodatetime (cannot recall which parts of this updated version interested the maintainers now). But this one should cover date format, intervals/periods, and recurring intervals.

ariebovenberg commented 8 months ago

@kinow release 0.4 now supports durations, see here.

This feature is specifically about (what the ISO8601 wikipedia article calls) intervals and repeated intervals. I'll adjust the title somewhat

kinow commented 8 months ago

Hi,

@kinow release 0.4 now supports durations, see here.

I tested the addition example and it works correctly, I believe.

In [1]: from whenever import (
   ...:     UTCDateTime, OffsetDateTime, ZonedDateTime, LocalSystemDateTime, NaiveDateTime
   ...: )
   ...: 

In [2]: wf_task_start = UTCDateTime(2024, 1, 1, hour=0)

In [3]: from whenever import years, months, days, hours, minutes

In [4]: wf_task_start = wf_task_start + days(5)

In [5]: wf_task_start
Out[5]: UTCDateTime(2024-01-06 00:00:00Z)

In [6]: wf_task_start = wf_task_start + days(5)

In [7]: wf_task_start
Out[7]: UTCDateTime(2024-01-11 00:00:00Z)

In [8]: wf_task_start = wf_task_start - months(1)

In [9]: wf_task_start
Out[9]: UTCDateTime(2023-12-11 00:00:00Z)

I saw you also had a DateTimeDelta that matched my examples with that PNNN syntax from ISO8601! So I used help(DateTimeDelta) and found the from_canonical_format function.

I did some quick tests, and this is exactly what I've been missing in a Python date time library!

In [10]: from whenever import DateTimeDelta

In [11]: period = DateTimeDelta.from_canonical_format('P1D')

In [12]: period
Out[12]: DateTimeDelta(P1D)

In [13]: wf_task_start + period
Out[13]: UTCDateTime(2023-12-12 00:00:00Z)

In [14]: wf_task_start + period + period
Out[14]: UTCDateTime(2023-12-13 00:00:00Z)

IMHO, it'd be good to have an example with that from_canonical_format. You are providing all the underlying functions necessary for someone to create an iterator to be used in a workflow with periodical cycles (e.g. climate, weather). Thanks a lot @ariebovenberg !

I think this issue can be closed now :slightly_smiling_face:

ariebovenberg commented 8 months ago

Thanks for the quick reply. I'm keeping this issue open to remind me to improve the documentation.

Currently there is an (undocumented) gap between the canonical format and strict ISO8601:

For further work, I'll refer to these issues:

ariebovenberg commented 8 months ago

As of release 0.5.0, there are now more explicit common_iso8601() and from_common_iso8601 methods on all types. The caveats of deltas are explicitly documented.

For follow-up issues, see the earlier comment in this thread ☝️. This issue can be closed.

kinow commented 8 months ago

Thank you so much for implementing it so quickly @ariebovenberg !