arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.71k stars 673 forks source link

Tokens A/hh seem to work on invalid time stamps #958

Closed tlitetrasci closed 3 years ago

tlitetrasci commented 3 years ago

Issue Description

Arrow does not seem to perform validation on timestamps for unusual formats where the information conflicts.

The following code snippet runs just fine:

>>> arrow.get("2021-01-30 14:00:00 AM", "YYYY-MM-DD hh:mm:ss A")
<Arrow [2021-01-30T14:00:00+00:00]>

First of all, since hh is documented to go to a maximum value of 12, I expect arrow to raise an error because the value is 14. Secondly, I expect arrow to notice that 14:00:00 (2 PM) conflicts with the string "AM".

Is this behaviour intentional? Or is it a bug?

System Info

systemcatch commented 3 years ago

Looks like we're missing some validation checks here, thanks for reporting this. I expect we can roll this into a bugfix release.

tlitetrasci commented 3 years ago

Thanks for confirming that this is unexpected behavior. I'm looking forward to seeing the bugfix.

ALee008 commented 3 years ago

Hey @systemcatch , @jadchaar ,

I would like to help out with this one. Do you have any details on how you imagine the validation check?

jadchaar commented 3 years ago

You can probably carry out the check similar to how we validate errors in parser.py's parse() function:

https://github.com/arrow-py/arrow/blob/1310dbbc32ce41adac4840e4e399da0f77836041/arrow/parser.py#L300-L305

You can check the parsed values and raise an appropriate ParserError. If you think there is a more appropriate place to put the validations, feel free to make that choice :)!

ALee008 commented 3 years ago

Hey again, could you check my attempt for the first issue (if token == "hh" then hour must be between 00 and 12):

https://github.com/ALee008/arrow/blob/3c09bb87f253271900906e75d8927a3c8620ce16/arrow/parser.py#L333

For the second issue I am not sure about the location. I have also added a question: https://github.com/ALee008/arrow/blob/3c09bb87f253271900906e75d8927a3c8620ce16/arrow/parser.py#L576

I am not even sure if this attempt matches your expectations at all :-).

tlitetrasci commented 3 years ago

Hi, has there been any update on this issue? I see @ALee008 has made some changes in a fork.

anishnya commented 3 years ago

Hi @ALee008 and @tlitetrasci. Apologies on behalf of the entire Arrow team for not getting back to you on this sooner (some professional and personal events have popped up for all of us). I'll be taking a look at @ALee008's fork and reviewing it over the next 48 hours.

tlitetrasci commented 3 years ago

No worries, thanks for the update!

ALee008 commented 3 years ago

@anishnya No worries, I have been busy myself recently. I'll be awaiting your update.

anishnya commented 3 years ago

Hey @ALee008. I've taken a closer look at the issues you've outlined. The token check appears to be correct. For the location, I think we'd prefer it to be in parse_token rather than in parse. Even though parse_token is only called by parse, I think it makes more sense to have the validation functionality within parse_token. In terms of the error message generated, I wouldn't worry about including the fmt or datetime_string strings. A simple message such as "hour token value must be between 0 and 12 inclusive for this (insert token name here)" should be good. Let me know if you need any further guidance and thank you for your patience and contribution to Arrow :).

ALee008 commented 3 years ago

Hey @anishnya , thanks for your feedback. Haven't forgotten about this. But been too busy lately. Hope to update the code in the days to come.

Here the updated code. I hope I didn't misunderstand you.

here and here

The tests are still missing obviously...

anishnya commented 3 years ago

@ALee008, that looks good to me. Once you get the tests added, feel free to make the PR. Let us know if you need any further guidance, or have any additional questions.

ALee008 commented 3 years ago

Hey @anishnya,

I removed the first ParseError because it would raise a ParseError if hour is not between 0 and 12, which is not desired.

I also added a test called test_parse_am in test_parser.py.

I push the latest code and added my adjustments. When running tox I got the following error:

grafik

Any ideas?

anishnya commented 3 years ago

That does seem odd, especially since you aren't modifying the shift method. @jadchaar, I noticed the test is skipped if the dateutil version if below 2.7.1, that should be the only reason this fails right, especially if shift isn't being modified? @ALee008 do you mind checking what version of dateutil you have installed?

ALee008 commented 3 years ago

Hey @anishnya, I've got Version 2.8.1 installed

grafik

anishnya commented 3 years ago

@ALee08, you're working out of the master branch of your forked repo correct? I'll pull it down and see what's happening.

ALee008 commented 3 years ago

Hey @anishnya , yes, that's what I did first (using the explanation here because I haven't done it before) How to update a forked Git repo.

systemcatch commented 3 years ago

I pulled down the branch but can't reproduce that error, can you update your branch to be equal with arrow master to see if that helps?

Actually @ALee008 how about you submit a PR and we can see if it fails on our CI infrastructure.

ALee008 commented 3 years ago

Hey @systemcatch, I updated my fork and added my changes. Test is still failing. My origin should be up-to-date now. As recommended I will submit a PR.