godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
88.81k stars 20.14k forks source link

ISO8601 datetime parsing in `Time.get_unix_time_from_datetime_string` #60691

Open miv391 opened 2 years ago

miv391 commented 2 years ago

Godot version

v4.0.alpha7.official [3e9ead05f]

System information

Windows 10

Issue description

There are still some inconsistencies in ISO8601 datetime parsing.

Steps to reproduce

Valid ISO 8601 strings that fail

func test_valid_iso8601_strings():
    var valids = [
        '2022-12-31T12:34', # seconds are optional
        '2022-12-31T12',    # minutes are optional too
        '20221231',         # short format date
        '20221231T123456',  # short format datetime
    ]   
    for valid in valids:
        print(valid, ' = ', Time.get_unix_time_from_datetime_string(valid))

Output:

2022-12-31T12:34 = -1
2022-12-31T12 = -1
20221231 = 0
20221231T123456 = -1

I think optional seconds and minutes should be supported (first two test cases). ISO 8601 contains also short format for date and time (last two test cases). Sure, it would be nice if short format were supported, but in my experience ISO 8601 pretty much always means the long format. If the short format is not supported, it should be said in the documentation.

Return value

Time.get_unix_time_from_datetime_string seems to return either 0 or -1 when parsing fails. That makes checking errors a little bit more difficult compared to a single error value. This is error prone, if the code just checks if the return value is 0. I would suggest using only one error value.

Invalid ISO 8601 strings that succeed

func test_invalid_iso8601_strings():
    var invalids = [
        '1999 2022-12-31T12:34:56',    # totally invalid format gets odd result
        '2022-12-31-09T12:34:56',      # garbage in date is silently ignored
        '  2022-12-31T12:34:56',       # leading spaces are allowed
        '2022-12-31 12:34:56',         # space is not an accepted separator in ISO 8601
    ]
    for invalid in invalids:
        var t = Time.get_unix_time_from_datetime_string(invalid)
        print(invalid, ' = ', t, ' = ', Time.get_datetime_string_from_unix_time(t))

Output:

1999 2022-12-31T12:34:56 = 946643696 = 1999-12-31T12:34:56
2022-12-31-09T12:34:56 = 1672490096 = 2022-12-31T12:34:56
  2022-12-31T12:34:56 = 1672490096 = 2022-12-31T12:34:56
2022-12-31 12:34:56 = 1672490096 = 2022-12-31T12:34:56

I think the first two cases should be fixed, especially the first one. The last two cases are less severe although I think that ISO 8601 parser should not do white space trimming as it might cause not noticing incorrect data.

Minimal reproduction project

No response

oninoshiko commented 9 months ago

This problem threw me for a loop as I was trying to enter 20231116T162136Z which is valid and I couldn't figure out what I was doing wrong. Until this is fixed, can we document what formats we don't understand (including the short formats)?

AThousandShips commented 9 months ago

@oninoshiko does it work without the Z?

Suffixes such as "Z" are not handled, you need to strip them away manually.

The documentation does state the expected format as YYYY-MM-DDTHH:MM:SS, but perhaps not explicitly enough that it expects that format

oninoshiko commented 9 months ago

@oninoshiko does it work without the Z?

Suffixes such as "Z" are not handled, you need to strip them away manually.

The documentation does state the expected format as YYYY-MM-DDTHH:MM:SS, but perhaps not explicitly enough that it expects that format

No, the Z doesn't matter one way or the other. I just double checked the documentation, and I'm not seeing it stating that as the expected format.

What I see is "ISO 8601 date and/or time string" and, while YYYY-MM-DDTHH:MM:SS is a valid ISO 8601, so is YYYYMMDDTHHMMSSZ.

Searching a bit more, It does say it for get_datetime_dict_from_datetime_string, but not for get_unix_time_from datetime_string. It needs to say it for both. Honestly, if we're not supporting all of 8601, and instead require one very specific format that happens to be permitted under it, it would be less confusing to remove all mention of it.