Extend parse_iso to accept '2020-06-10 14:33:23.489 +0200'

arrow-py / arrow

🏹 Better dates & times for Python

https://arrow.readthedocs.io

Apache License 2.0

8.71k stars 673 forks source link

Extend parse_iso to accept '2020-06-10 14:33:23.489 +0200' #807

Closed impact27 closed 4 years ago

impact27 commented 4 years ago

Feature Request

I would like

arrow.get('2020-06-10 14:33:23.489 +0200')

To work. For now it doesn't like the second space. It would be easy for me to fix this but it kind of looks like this is something that should work.

I understand that it might not be striclty ISO 8601 compliant but the following comment suggest that the function is a bit more inclusive: https://github.com/crsmithdev/arrow/blob/0a37fa5643ed2a0fe7714854fc3d91644fc8fb4e/arrow/parser.py#L115

impact27 commented 4 years ago

For reference, this is the format used by micromanager:

https://github.com/micro-manager/micro-manager/blob/9e32d4c66b1fdf3bf038994a016dc5449f108cc1/mmstudio/src/main/java/org/micromanager/acquisition/internal/DefaultAcquisitionManager.java#L58

impact27 commented 4 years ago

PS: This is also the format used in the doc at https://arrow.readthedocs.io/en/latest/:

>>> local.format()
'2013-05-11 13:23:58 -07:00'

but on my computer I don't get the space so it might be an error in the doc:

>>> local.format()
'2020-06-15 03:23:43-07:00'

systemcatch commented 4 years ago

Hello @impact27, we've thought about adding white space normalization before (#421) but I guess this is slightly different.

I'm not opposed to adding support for this as it's a reasonable format but we need to avoid breaking too much stuff in the parsing methods. The snippet from the docs is wrong and needs fixing.

systemcatch commented 4 years ago

Hey @impact27 just checking you know that custom formats can be passed to arrow?

(arrow) chris@ThinkPad:~/arrow$ python
Python 3.7.4 (default, Sep 19 2019, 11:01:37) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import arrow
>>> arrow.get('2020-06-10 14:33:23.489 +0200', "YYYY-MM-DD HH:mm:ss.SSS Z")
<Arrow [2020-06-10T14:33:23.489000+02:00]>

You can also escape special characters (i.e. regex) using [ ]

>>> arrow.get('2020-06-10 14:33:23.489   +0200', "YYYY-MM-DD HH:mm:ss.SSS[\s+]Z")
<Arrow [2020-06-10T14:33:23.489000+02:00]>

impact27 commented 4 years ago

Thanks for the tip about regex :) I think that if arrows accept '2020-06-10 14:33:23.489+0200' in addition to the T notation of ISO 8601, it would make sense to accept a space before the time zone as well. I am using "YYYY-MM-DD HH:mm:ss.SSS Z" in my code for now.

jadchaar commented 4 years ago

Glad to hear you found a solution with a custom format string. Regarding this issue, I am going to close it since it is very similar to https://github.com/crsmithdev/arrow/issues/421.

Also, we are looking to keep the parse_iso function (the function that is called when arrow.get() is invoked with one argument and without a format string) primarily focused on ISO 8601 formats. When we say "support more than ISO 8601" in the TODO comment, we mean that we are expanding the token combinations and dividers we support, rather than particular spacing choices. Redundant spacing is better kept to a new function argument we add if there continues to be enough support for the feature 😄 .

Currently, the best solution is passing a custom format string or the regex Chris mentioned above.