arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.7k stars 673 forks source link

handling of leading spaces in month, day, and hours by parser #1066

Closed rtphokie closed 2 years ago

rtphokie commented 2 years ago

Issue Description

Currently arrow can parse 1 or two digit day of the month with or without a leading zero. It does not handle a leading space however. A leading space before a single digit day of the month, month number, or hour is formatting found commonly (lines up better, offers some sorting benefits, etc.) and should be handled better.

Workaround: replace double spaces with a single space before passing to arrow.get()

e.g.

datestring = 'Jan 1 2022' dt = arrow.get(datstring.replace('. ', ' '). 'MMM D YYYY')

System Info

systemcatch commented 2 years ago

Hello @rtphokie, your example seems to work fine for me without replacement.

>>> datestring = 'Jan 9 2022'
>>> arrow.get(datestring, 'MMM D YYYY')
<Arrow [2022-01-09T00:00:00+00:00]>

For situations with extra whitespace use the normalize_whitespace flag.

>>> datestring_spaced = 'Jan  9 2022'
>>> arrow.get(datestring_spaced, 'MMM D YYYY', normalize_whitespace=True)
<Arrow [2022-01-09T00:00:00+00:00]>

https://arrow.readthedocs.io/en/latest/#redundant-whitespace

rtphokie commented 2 years ago

Missed that flag, thanks for pointing it out.

Shouldn't that be default behavior though?

systemcatch commented 2 years ago

We don't like to assume what the user wants, or do too much hidden processing of the string. Easier for them to explicitly invoke the flag.