JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.65k stars 5.48k forks source link

Possible bug when interpreting date format from string `200241022` #56301

Open edward-bestx opened 3 days ago

edward-bestx commented 3 days ago

I think I may have found a potential bug when interpreting the following string as a date:

arg = "200241022"
date::Date = Date(arg, "yyyymmdd")

Which produces the following error.

ERROR: LoadError: ArgumentError: Month: 41 out of range (1:12)

Apologies if this is a duplicate or non issue. I did have a look to see if I could find the reason why this is happening, but it looked like it might be a more difficult issue to track down than I had initially thought.

KristofferC commented 3 days ago

Did you mean to write "20241022"?

edward-bestx commented 3 days ago

Did you mean to write "20241022"?

No, this is what I think is a bug.

The date 200241022 should be interpreted as year=20024 October 22nd. It isn't interpreted as this - the month is 41 instead of 10.

I realize that it is unlikely that someone would want to parse a date for the year 20024, however such a year should be a valid input.

I thought it might be possible that this indicates a bug somewhere else in the Dates library.

What behavior would I expect to see?

edward-bestx commented 3 days ago

By the way, if you disagree with my above suggestion please just close the issue as a non issue. I'm raising it because I am unsure if this is the behavior you would want to see or expect. Maybe it is?

giordano commented 3 days ago

Since the datetime format was specified as "yyyymmdd" there should perhaps be a warning that the number of digits for the year is 5 instead of 4.

How can the program possibly know you meant the year to have 5 digits if you literally put 4 ys there without mindreading capabilities?

edward-bestx commented 3 days ago

How can the program possibly know you meant the year to have 5 digits if you literally put 4 ys there without mindreading capabilities?

It doesn't need to have mind reading capabilities. It can see 9 digits have been supplied as input rather than the typical 8 which would be expected.

giordano commented 3 days ago

It can see 9 digits have been supplied as input rather than the typical 8 which would be expected.

So you want it to throw an error for a mismatched input string/format? Which is what's already happening? For example:

julia> using Dates

julia> Date("200211111", "yyyymmdd")
ERROR: ArgumentError: Day: 111 out of range (1:30)
edward-bestx commented 3 days ago

As I already stated, it should not be printing an error which says ArgumentError: Day: 111 out of range (1:30).

That doesn't make any sense, because the user didn't pass 111 as the number of days.

If you think it's not significant enough of an issue, as I already said, please feel free to close this issue.

If we align both strings vertically, the issue becomes obvious:

"200211111"
 "yyyymmdd"

The lengths of the strings don't match. So that should be an error, or a warning, should it not?

giordano commented 3 days ago

If we align both strings vertically, the issue becomes obvious:

"200211111"
 "yyyymmdd"

That's an interesting way to align it, since parsing happens left-to-right, not right-to-left:

200211111
yyyymmdd

The lengths of the strings don't match. So that should be an error, or a warning, should it not?

It's an error already... And a warning is easy to miss, so that doesn't help much.

edward-bestx commented 3 days ago

Well, either way you align it you have two inputs which should be of the same length which are not.

It could be aligned either way. For numerical things, which something like a date is, we read them right to left, not left to right.

For example the number 1234567. You know this is "about 1 million". The reason you know it is "about 1 million" is because you read it right to left. You count the digits, effectively, reading them as groups of 3.

1,234,567

You cannot read it left to right because you cannot know in advance where to put the , (thousands separators)

KristofferC commented 3 days ago
julia> arg = "20024-10-22"
"20024-10-22"

julia> date::Date = Date(arg, "yyyy-mm-dd")
20024-10-22

works which kind of suggests that the one without dashes should also work? What do other programming languages do for this?

giordano commented 3 days ago

Well, either way you align it you have two inputs which should be of the same length which are not.

That's to allow parsing dates with less ambiguous formats (separating the parts with the the hyphen lets the parser identify better each part):

julia> Date("20024-10-22", "yyyy-mm-dd")
20024-10-22
giordano commented 3 days ago

works which kind of suggests that the one without dashes should also work?

That's ambiguous though.

What do other programming languages do for this?

As far as I can tell, Python's datetime only supports 4-digit years (or more precisely between datetime.MINYEAR and datetime.MAXYEAR). Third-party packages may support extended formats.

KristofferC commented 3 days ago

That's ambiguous though.

Is it if you interpret it as ...yyyymmdd where ...yyyy is a vararg years?

adienes commented 3 days ago

the one with dashes should error IMO

jmkuhn commented 2 days ago

The documentation does explicitly distinguish between formats with and without delimiters. The following is documented as allowed.

julia> Date("2024-10-22", "y-m-d")
2024-10-22

Undocumented is that it ignores the fixed width slots with delimiters the specified number of digits in the last field before a delimiter or the final field of the string.

julia> Date("2024-10-22", "yy-mmm-dddd")
2024-10-22

When mixing fixed width and delimiters some don't look right to me. the specified number digits in the field before a delimiter is ignored

julia> Date("2024002-11", "yyyyymm-d")
20240-02-11

julia> Date("2024002-11", "yyyymm-d")
2024-02-11

I think the second should error.

jmkuhn commented 2 days ago
  • How does it behave for this date: 9990101 (1st January year 999) when the datetime format is "yyyymmdd" ? There is a digit missing for the year.
julia> Date("9990101", "yyyymmdd")
9990-10-01

julia> Date("9990100000001", "yyyymmdd")
9990-10-01

Parses the first x fields with the specified number of digits, then ignores the number of digits when parsing the final field?