gbif / parsers

Various GBIF parsers for dates, countries, language, taxon ranks, etc
Apache License 2.0
4 stars 8 forks source link

24 eu us date #25

Closed qifeng-bai closed 4 years ago

qifeng-bai commented 4 years ago

Main changes: Created DateFormatHint.EU_DMYT and DateFormatHint.EU_DMYT : multiple datetime formats

Create a method: parse(String input, DateFormatHint[] prefResolvers) When the standard parsing process failed due to an ambiguous date, like 2/3/2000, try {@code prefResolvers} to parse date.

MattBlissett commented 4 years ago

Hi,

I found some of the date parsing classes confusing, so I've made some additional changes on this branch to tidy things up.

Created DateFormatHint.EU_DMYT and DateFormatHint.EU_DMYT : multiple datetime formats

I changed that to DMYT and MDYT, as they're used outside Europe and the US and are now consistent with YMDT and YMD.

There was also inconsistent behaviour when dates were provided with or without times in ISO vs other formats. It now does this:

parse("2018-02-01", YMD) → Success
parse("2018-02-01", YMDT) → Fail

parse("14/08/2020 14:38:30", DMYT) → Success
parse("14/08/2020 14:38:30", DMY) → Fail

This parsing method could be used if we're certain of the date format for a dataset (e.g. the DWCA field). I don't think it's used anywhere.

Create a method: parse(String input, DateFormatHint[] prefResolvers)

I defined DMY_FORMATS to help with this:

parse("12/08/2020 14:38:30", DMY_FORMATS) → Success

I've renamed DateFormatHintDateComponentOrdering. It wasn't really a "hint" when it makes the parsing succeed or fail. I've also tidied up a lot of comments and the documentation, which were out of date.

Does this all seem reasonable?

qifeng-bai commented 4 years ago

@MattBlissett

When we pass a hint in, we ONLY try the given hint , if the parsing failed, then return a failure immediately, see: https://github.com/gbif/parsers/blob/master/src/main/java/org/gbif/common/parsers/date/ThreeTenNumericalDateParser.java#L248 That causes: parse("2018-02-01", YMDT) → Fail.

I am wondering whether it would be safer: we try the given hint first, if it fails, then go through the normal parsing process -> try ISO standard parser and then MultiParser.

MattBlissett commented 4 years ago

I think if supporting DMY and ISO is required, then that should be specified in the arguments:

These tests both pass:

    parseResult = TEXTDATE_PARSER.parse("2018-01-02 11:20:30+0100", new DateComponentOrdering[] {DMYT, DMY, YMDTZ, YMDT, YMD, YM, Y});
    assertZonedDateTimeResultEquals("2018-01-02T11:20:30+01:00", parseResult);
    parseResult = TEXTDATE_PARSER.parse("2/1/2018 11:20:30+0100", new DateComponentOrdering[] {DMYT, DMY, YMDTZ, YMDT, YMD, YM, Y});
    assertZonedDateTimeResultEquals("2018-01-02T11:20:30+01:00", parseResult);

And so does this:

    parseResult = TEXTDATE_PARSER.parse("2018-01-02 11:20:30+0100", new DateComponentOrdering[] {DMYT, DMY, ISO_ETC});
    assertZonedDateTimeResultEquals("2018-01-02T11:20:30+01:00", parseResult);
    parseResult = TEXTDATE_PARSER.parse("2/1/2018 11:20:30+0100", new DateComponentOrdering[] {DMYT, DMY, ISO_ETC});
    assertZonedDateTimeResultEquals("2018-01-02T11:20:30+01:00", parseResult);