ecmwf-lab / ecml-tools

Apache License 2.0
7 stars 1 forks source link

Use ISO8601 for dates, datetimes and durations #3

Open leifdenby opened 10 months ago

leifdenby commented 10 months ago

I would like to suggest that rather than using a customised data-related object serialisation ecml-tools adopts the ISO8601 standard. There is a really nice python package isodate for handling parsing of these and it makes it much easier to build other tools that can work with the same configuration files etc. Also, it avoids users having to try to understand a new format just to use the package.

b8raoult commented 10 months ago

As long as the behaviour is the same, no problem.

leifdenby commented 10 months ago

It would change the functionality in how to provide dates, datetimes and duration. So it depends on what you mean by "behaviour".

From the README on specifying date ranges it currently says:

The following are equivalent way of describing start or end:

    2020 and "2020"
    202306, "202306" and "2023-06"
    20200301, "20200301" and "2020-03-01"

I don't think it is a good idea to support so many variations on how to format dates. For example if someone set start=201009 it would be ambiguous from just reading that input whether this refers to September 2010 or Oct 9th 2020. It is these kind of ambiguities ISO8601 was created to avoid.

Also, having so many variations means that another application needs to implement a lot of logic to be able to support parsing of the config file. I can for example imagine wanting to use these values in javascript (a frontend the visualises data ranges used for a dataset). ISO8601 is supported in many different javascript libraries (for example momentjs)

Finally, I think the current way the start and end can be provided, for example by only giving "2020" (I assume this is the year), implies that the start of the year when used for start, but the end of the year when used for end. Maybe this isn't the current behaviour, but I think we should add to the README if that is how its intended to work.

So what I suggest is that we depreciate the current format and only support strings formatted according to ISO8601. That will mean the README would instead read:

The `start` and `end` arguments for specifying time-spans should be provided as ISO8601 formatted strings, e.g.

    YYYY e.g. "2020" (will imply start of year when used for `start` and end of year when used for `end`)
    YYYY-MM e.g. "2023-06" (will imply start of month when used for `start` and end of month when used for `end`)
    YYYY-MM-DD e.g. "2020-03-01" (will imply start of day when used for `start` and end of day when used for `end`)
    YYYY-MM-DDTHH-MMZ e.g. "2024-01-26T10:32:57Z"
leifdenby commented 10 months ago

This will also change how frequencies should be given. In the ISO 8601 standard this are done with P-prefix, e.g. PT10H is 10 hours, PT7M is 7 minutes, P7D is 7 days. https://en.wikipedia.org/wiki/ISO_8601#Durations

b8raoult commented 10 months ago

You are welcome to implement the changes. Just make sure that they are backwards compatible for now, so that our current software stack still works. It is just a matter of accepting both types of input.