arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.73k stars 682 forks source link

v0.15.0 Changes❗ #612

Closed jadchaar closed 5 years ago

jadchaar commented 5 years ago

In the upcoming version 0.15.0 of arrow, we will be making a lot of changes to the behavior of arrow.get() to address a number of reported parsing bugs. We have outlined the changes below:

Fixes

For example,

>>> arrow.get("garbage2017everywhere")
<Arrow [2017-01-01T00:00:00+00:00]>
>>> arrow.get('Jun-2019', ['MMM-YY', 'MMM-YYYY'])
<Arrow [2020-06-01T00:00:00+00:00]>

These will raise a ParserError in 0.15.0.

The following will still work as expected:

arrow.get("1565358758", "X")
arrow.get("1565358758.123413", "X")
arrow.get(1565358758)
arrow.get(1565358758.123413)

New Features

Issues addressed

Development progress

You can view the progress of these changes here: https://github.com/crsmithdev/arrow/tree/Version-0.15.0.

Disable Warnings

To get rid of the ArrowParseWarning messages in 0.14.3 onwards, do the following:

import warnings
from arrow.factory import ArrowParseWarning

warnings.simplefilter("ignore", ArrowParseWarning)
kornpow commented 5 years ago

I still get warnings when I pass a parser string to .get() Also here is a full example for disabling warnings if anyone needs: Using arrow 0.14.3

import arrow

import warnings

from arrow.factory import ArrowParseWarning

astring = "2019-07-29T13:58:44.460381Z"

# Show warning still?

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]")

# Filter out the warnings

warnings.simplefilter("ignore", ArrowParseWarning)

# No warnings anymore

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]")
jadchaar commented 5 years ago

Hey @sako0938, we made the conscious decision to include the warning on the arrow.get() method with and without a format string since significant changes are coming to both.

@systemcatch we could maybe better target the format string warnings to the timestamp token X?

systemcatch commented 5 years ago

@jadchaar @sako0938 that's going to be difficult since many of the changes apply with or without a format string, I think it's easier to just give a general warning despite it being more annoying.

For example in current arrow;

>>> arrow.get("garbage2017everywhere", "YYYY")
<Arrow [2017-01-01T00:00:00+00:00]>
>>> arrow.get("garbage2017everywhere")
<Arrow [2017-01-01T00:00:00+00:00]>

arrow 0.15.0 will raise errors for both these cases.

kornpow commented 5 years ago

Sounds great. I mostly wanted to show how to remove the warnings for others since the issue didn't provide all the context needed at the time.

davix commented 5 years ago

Hi, I'm a first-time user of arrow with latest version installed (0.14.4). Still can't understand from previous answers that why the example below still has warning. Is something still missing?

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]")
 /Users/whuang/Envs/p2/lib/python2.7/site-packages/arrow/factory.py:249: ArrowParseWarning: The .get() parsing method with a format string will parse more strictly in version 0.15.0.See https://github.com/crsmithdev/arrow/issues/612 for more details.
systemcatch commented 5 years ago

Hi, I'm a first-time user of arrow with latest version installed (0.14.4). Still can't understand from previous answers that why the example below still has warning. Is something still missing?

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]") /Users/whuang/Envs/p2/lib/python2.7/site-packages/arrow/factory.py:249: ArrowParseWarning: The .get() parsing method with a format string will parse more strictly in version 0.15.0.See #612 for more details.

Hey @davix, on version 0.14.3+ there will always be a warning generated when arrow.get() does str parsing (with and without a format passed). To disable the warning please use @sako0938 or our solutions.

davix commented 5 years ago

on version 0.14.3+ there will always be a warning generated when arrow.get() does str parsing (with and without a format passed)

From my understanding, a warning means that a user is doing it incorrectly or at least imperfectly. Why is there warning even if a format is provided, I'm curious? or is there a more proper command other that .get()?

systemcatch commented 5 years ago

From my understanding, a warning means that a user is doing it incorrectly or at least imperfectly. Why is there warning even if a format is provided, I'm curious? or is there a more proper command other that .get()?

Warnings can be also be used to notify users of upcoming changes to the package. The changes we've made to the .get() method can apply both with and without a format provided. In 0.15.0 these improvements will take effect and the warnings will disappear.

joinemm commented 5 years ago

The flexible .get() was in my opinion one of the best features of arrow, as it allowed for parsing of unknown or user generated time strings without any extra logic. I wonder if it's possible you could keep the old functionality of .get() in a new separate function? Maybe like .fuzzy_get() or something.

systemcatch commented 5 years ago

@joinemm that functionality will still be there, indeed in version 0.15.0 more formats will parse in .get() without any other logic being needed than do currently.

petermolnar commented 5 years ago

I started seeing the message today; I'm with @joinemm - .get is one of the main reasons for me to use arrow. I'm very surprised by this move.

EDIT can at least rfc3339 be included in the default string parsing list without showing that warning, please?

joinemm commented 5 years ago

@systemcatch really? But what about

arrow.get() will no longer parse natural language strings

Timestamp strings are no longer supported in the arrow.get() method

I feel like these two were very important

petermolnar commented 5 years ago

Timestamp strings are no longer supported in the arrow.get() method

That is the main function why I'm using arrow; parsing rfc3339 and ISO-8601 without needing to deal with TZ mockery in Python.

I'd settle with a .get_iso8601 method that doesn't need any formatting input, but deals with any variations of ISO-8601.

jadchaar commented 5 years ago

Timestamp strings are no longer supported in the arrow.get() method

Apologies, this statement needs to be tweaked.

We are dropping support for usage like: arrow.get("1565358758") because we are adding support for the basic format in ISO 8601. This would make timestamp strings and the basic format ambiguous.

You can still use get as follows: arrow.get(1565358758) and if you need to parse a timestamp string, arrow.get("1565358758", "X").

Let us know if you all still have concerns and feedback about this. I have tweaked the original message to reflect my comments above.

joinemm commented 5 years ago

@jadchaar Oh ok, looks like it was just badly worded then. So something like arrow.get("2002-10-02T10:00:00-05:00") will still work as it used to?

jadchaar commented 5 years ago

@joinemm indeed :). We are not trying to break existing workflows, we are just trying to address core parsing issues that have given arrow a bad name over the years.

arrow.get() will no longer parse natural language strings

As for this, the functionality will still remain intact if you provide a format string like this: arrow.get("Meet me at 2016-05-16T04:05:06.789120 at the stadium", "YYYY-MM-DDTHH:mm:ss.S" ).

This will no longer work though: arrow.get("Meet me at 2016-05-16T04:05:06.789120 at the stadium"). We made this decision because we wanted get() without a format string to simply be for parsing date and time strings by themselves so that we can prevent any unintentional parsing issues (e.g. accepting blah2016).

systemcatch commented 5 years ago

can at least rfc3339 be included in the default string parsing list without showing that warning, please?

@petermolnar the warning is there so people can see the full list of changes in this issue, thanks to that warning you and @joinemm showed us that the change log was poorly worded which should hopefully mean less confusion for others in the future.

The warning will be gone in 0.15.0 which should be released around the end of August.

KenKundert commented 5 years ago

The current version of the manual on readthedocs.io says:

Some ISO-8601 compliant strings are recognized and parsed without a format string:

arrow.get('2013-09-30T15:34:00.000-07:00') <Arrow [2013-09-30T15:34:00-07:00]>

However, that code now generates a warning. That forces an ugly choice. Do I put in the warning suppression code, which just adds clutter to my code and adds risk that some new future warning will be unintentionally suppressed, or do I specify the format string for iso8601, which clutters up my code, seems error prone, and also seems like domain specific knowledge that should be built in to Arrow.

I have instead decided to just forbid the use of versions 0.14. by adding the following to my setup.py file: `install_requires = ['arrow!=0.14.']` That eliminates the warning message without requiring me to add clutter to my code.

Hopefully things are better in 0.15.*.

merriam commented 5 years ago

This issue has come up a number of times in different date/time libraries; I last ran into it in MomentJS. The progression is surprisingly consistent:

  1. A native library exists, e.g., DateTime. It is found wanting, particularly in parsing dates.
  2. A new library is written, e.g., Arrow, that provides a liberal, intuitive parsing syntax.
  3. The library gains popularity and a large number of additional features. It is used in production.
  4. There are users that want to use the library in production and find a number of times where liberal parsing is too liberal, not raising errors. These users are now maintainers or supporters.
  5. A change is proposed to make strict parsing the default. This is introduced as a breaking change, as the proponents do not plan to ever use liberal parsing. It is, curiously, always introduced in a way that extra warnings to be generated.
  6. The change is implemented, causing wide spread incompatibilities among less vocal users. When they ask "why do it this way?" they are shouted down.

The proposed solutions are usually:

  1. Provide a global 'strict mode', e.g., Arrow.strict_mode()
  2. Provide an optional strict parameter, e.g., Arrow.get(..., strict=True)

Discussion of the solutions devolves into camps wanting default strict versus default liberals. As maintainers are in the default strict camp, neither solution is implemented, the library is too hard to use, and starts a declining adoption rate until maintainers are no longer supported by their companies to maintain the package.

Is there any way to derail this path?

systemcatch commented 5 years ago

The current version of the manual on readthedocs.io says:

Some ISO-8601 compliant strings are recognized and parsed without a format string:

arrow.get('2013-09-30T15:34:00.000-07:00') <Arrow [2013-09-30T15:34:00-07:00]>

However, that code now generates a warning. That forces an ugly choice. Do I put in the warning suppression code, which just adds clutter to my code and adds risk that some new future warning will be unintentionally suppressed, or do I specify the format string for iso8601, which clutters up my code, seems error prone, and also seems like domain specific knowledge that should be built in to Arrow.

Hello @KenKundert that string does not need a format to be passed now or in the future, the warning will occur with or without a format string. This is so users are aware of the changes coming in 0.15.0 which affect both cases. ArrowParseWarning is specific and won't be used after 0.15.0.

systemcatch commented 5 years ago

@merriam

4. There are users that want to use the library in production and find a number of times where liberal parsing is too liberal, not raising errors.  These users are now maintainers or supporters.

Indeed, take a look at the examples below. They're clearly wrong and introduce errors that are hard to find and debug. I assume you agree these should be fixed?

arrow.get('Jun-2019', ['MMM-YY', 'MMM-YYYY'])
<Arrow [2020-06-01T00:00:00+00:00]>
arrow.get('blabla102015').isoformat()
'1020-01-01T00:00:00+00:00'
arrow.get('13/4/2045')
<Arrow [2045-01-01T00:00:00+00:00]>
5. A change is proposed to make strict parsing the default.  This is introduced as a breaking change, as the proponents do not plan to ever use liberal parsing.  It is, curiously, always introduced in a way that extra warnings to be generated.

This change does not make strict parsing the default, it simply corrects obvious errors. Warnings are clearly needed to inform users about these changes, unfortunately due to the parser design it is difficult to provide specific warnings, hence the need for a more general one.

charlax commented 5 years ago
In [1]: import arrow

In [2]: arrow.get("2018", "YYYY")
/Users/ca/.local/share/virtualenvs/api-uUs12mY4/lib/python3.6/site-packages/arrow/factory.py:249: ArrowParseWarning: The .get() parsing method with a format string will parse more strictly in version 0.15.0.See https://github.com/crsmithdev/arrow/issues/612 for more details.
  ArrowParseWarning,
Out[2]: <Arrow [2018-01-01T00:00:00+00:00]>

I understand the impetus to warn users about the upcoming change, but this feels very heavy-handed. A plain, simple example will yield the warning. Removing the warning requires 3 lines of code... granted it's not a lot, but that's a manual step everyone will have to take.

What about issuing a warning only when the code will need to change - this will give a chance to developers to prepare for the breaking change. Not sure how difficult this is considering the parser design.

Also, it would be nice for https://arrow.readthedocs.io/en/latest/ to include a migration guide.

merriam commented 5 years ago

@merriam

They're clearly wrong and introduce errors that are hard to find and debug. I assume you agree these should be fixed?

No. Really. I don't.

You can ask the library:

Aside from the global switch or parameter route, maybe one could slowly deprecate arrow.get() with a DeprecationWarning, and replace it with arrow.quick_get(), arrow.strict_get(), and arrow.loose_get()? It has the advantages of breaking everyone's code at a known future time and making the choice of fix an explicit, rather than implicit one. The guide could mention a work around hack like arrow.get = arrow.strict_get for the lazy.

Taking away the messy parsing without replacement just causes chaos and makes code ugly. Keep code beautiful.

jadchaar commented 5 years ago

@charlax we do apologies for the heavy-handed warnings, but we decided to make the error very upfront due to the complexity of the parser design and to get a conversation going around the changes. We very much appreciate the feedback you all are providing, and we may have not gotten it if it had not been for the obvious warnings :). That said, we are looking to release 0.15.0 in the near future, so the warnings should be gone soon.

@merriam thanks for the feedback, but like Chris said above, we are simply trying to correct obvious errors that could confuse users. Just so that we are on the same page, what do you see as Arrow parsing being too strict? Can you give us a few examples? That will help us determine whether or not our changes are too strict (and whether or not a strict flag is warranted)

Also, something to note is that @systemcatch and I have been working hard to address the feedback that users have given on Arrow parsing over the years (issues that have gone unaddressed for a long time). We are not trying to change the things that make Arrow great, but rather we are trying to fix obvious bugs. We are volunteers and have absolutely no corporate- or company-backing, we are doing this for the community :).

merriam commented 5 years ago

@jadchaar Thank you for your polite note. It sounds like pulling together a good set of test cases should fall on me. Would you be adverse to a pull request that added optional parameters, e.g., get(..., guess=True), as the the opposite of strict?

jadchaar commented 5 years ago

Hey @merriam, thanks for the offer. I was actually thinking about an opposite flag to strict as well. We could call it either guess, loose, or fuzzy.

A contribution would be great, but keep in mind that a PR would have to be made against our Version 0.15.0 branch rather than master since the changes have not been merged in yet (awaiting the merge of https://github.com/crsmithdev/arrow/pull/639 into the 0.15.0 branch). If we decide to add a flag to loosen the parsing standards, I think it will simply need to prevent the word boundary from being added to the regex here: https://github.com/crsmithdev/arrow/blob/Version-0.15.0/arrow/parser.py#L277.

Before you make the contribution, I think it would be useful for you to post a few of the test cases you have in mind (to warrant the flag) here so we can discuss :).

Thanks again for the feedback!

gsemet commented 5 years ago

Hello. Can you avoid setting

warnings.simplefilter("always", ArrowParseWarning)

in your library, this clearly mess up warning set up. You can keep the Warning but let the users handle the filtering themself without having to import arrow.factory first.

Also, can you provide the example the format with optional parts (ex able to parse '2019-08-30T11:09:57+00:00' and '2019-08-30T10:57:25.931Z')

thanks

systemcatch commented 5 years ago

Hi @gsemet, on reflection we were probably over zealous with the warnings for these changes. But with python pre 3.7 not showing deprecation warnings by default we really wanted to make sure they were seen.

Anyway 0.15.0 will be released soon and the warnings will be gone.

salixrosa commented 5 years ago

A handful of days back, I made the mistake, it seems, of letting pip update Arrow. There were a handful of depreciation warnings we caught and remedied in testing, and then we rolled a major update to production. For some reason I'm not particularly interested in, we did not catch this one.

In the last hour, there have been 181,486 "arrow" warnings (this depreciation warning) logged. This is problematic.

I wanted a quick, temporary fix for this issue (pushing code to production is not quick) and tried setting the env variable to suppress this: PYTHONWARNINGS=ignore. It had no effect.

Has anybody here tried swallowing these warnings with just an env variable? Any success?

jadchaar commented 5 years ago

Hey @salixrosa, we apologize for the issues you are facing. Can you try export PYTHONWARNINGS="ignore"?

salixrosa commented 5 years ago

@jadchaar

We're working in a containerized environment where export doesn't really make sense, as I understand things. We're setting the env variable the same way we're setting others that are used as early as the command that starts the python server; a line of PYTHONWARNINGS=ignore should set the variable for all relevant processes. I also tried PYTHONWARNINGS=ignore::ArrowParseWarning, which similarly had no effect. I was hoping the answer would simply be some other string for this variable :-/

Thanks, though!

gsemet commented 5 years ago

maybe you can add a globla environment variable that would skip the warnings.simplefilter("always", ArrowParseWarning) line in your library, so that you won't mess with stdout.

I planning to switch to an earlier version of arrow or maybe switch to another livrary, some of my tools really need a clean stdout.

honzajavorek commented 5 years ago

Please release 0.15.0 so the warnings are actionable. Currently there's no path I can take to get rid of the warnings except of forcefully suppressing them. There's nothing I can fix or upgrade to and it's polluting my app's output.

thesimj commented 5 years ago

I don't know how long to wait until you release version 0.15, but this warning message create a huge problem in ElasticStack (Logs) and required much more space to operate. Please release version 0.15!

gsemet commented 5 years ago

Just revert to the previous version. There is the possibility to monkey patch arrow in order to ditch this ugly call to warning, but I would prefer if the kind author to release an intermediate version of 0.14 without this line :)

jadchaar commented 5 years ago

Hey all, thanks for the feedback. Chris and I have decided to release an intermediate version (0.14.7) like @gsemet suggested. Our goal was to facilitate a conversion about the 0.15.0 changes, and I think we have done just that. Thanks for the feedback and please keep it coming!

We are planning on releasing 0.15.0 in the next week or so.

case commented 5 years ago

Thanks for your work on this, @jadchaar!

thesimj commented 5 years ago

Thank you for great work, @jadchaar

jaapz commented 5 years ago

One the one hand it's great you are so explicit with warning users that the API is going to break next release, on the other hand next time it would be nice to have a direct clean action path for making the warnings go away (like releasing the broken-API 0.15.0 version at the same time as introducing the warning version 0.14.x).

Now I have to deal with the minor inconvenience of having to explicitly ignore these warnings in our test suite (all other warnings are seen as errors by our test runner), and then remove that again when we move to 0.15.0 in a few weeks. It's not the end of the world, but it's annoying.

gsemet commented 5 years ago

At least we know the API breaks in 0.15 :) Thanks for the intermediate releases :)

jadchaar commented 5 years ago

Hey all, v0.15.0 has been released! We have made a huge effort to keep things as compatible as possible with previous versions, but this release may potentially break existing code. Here is the final change log:

0.15.0

# will NOT work in v0.15.0
arrow.get("1565358758")
arrow.get("1565358758.123413")

# will work in v0.15.0
arrow.get("1565358758", "X")
arrow.get("1565358758.123413", "X")
arrow.get(1565358758)
arrow.get(1565358758.123413)

This release has been months in the making and @systemcatch and I have put in a ton of work into it. Please do not hesitate to reach out with any feedback or concerns. Thanks!

escapewindow commented 5 years ago

Hi,

First off, thank you for maintaining arrow, and thank you for addressing some long standing bugs.

Second, fallout from several recent arrow fixes have broken us downstream. This is understandable. However, I'm wondering if you would consider switching to semver. Namely:

We've found that major version bumps are a great way of informing downstream users that something has changed that may merit more investigation before deploying. This would allow us to, in this example, pin to arrow<2, and pick up fixes without automatically picking up breaking changes.

Thanks again!

jadchaar commented 5 years ago

Hi @escapewindow, @systemcatch and I actually discussed this recently. We decided that we would like to release 1.0.0 once we find that arrow has most basic features that you'd expect from a date and time library implemented. We are planning on switching to a full semantic versioning scheme once we add daylight savings time improvements (planned for 0.16.0). We wanted to get arrow into a solid feature state before transitioning to semantic versioning. It is definitely on our minds as well :).

jaapz commented 5 years ago

Great release, thanks!

jadchaar commented 5 years ago

Closing this issue because the v0.15.0 release seems to have gone well. Feel free to open a new issue if new problems arise. Thanks again everyone for contributing to the conversation and helping to make Arrow even better!

jmidyet commented 5 years ago

@jadchaar thousands of people/projects are already using the library. By not following standard versioning practices, you're causing headaches for other humans. Please do a major version bump for breaking changes.

andrewelkins commented 5 years ago

@jmidyet Agreed and we're working on it.

gsemet commented 5 years ago

While I agree with you, I must admit semver says something like « every version in 0.x can break or not, up to the lib ». Lot of packages are 0.x because their maintainer just does not want to deal with backward-forward compatibility. Yes human error happens, breaking change can sneak between two minor change, but semver has a purpose to minimize this. It is a pity some maintainers do not follow it, it makes updating very risky so we end up freezing lib to an old version, decreasing security.

Semver is not that hard. Use PBR to automatize version bump or any other similar lib. And a bit of self discipline.

By the way, pendulum is a very sexy lib, I am not saying it is better but it looks like it follows semver.

jadchaar commented 5 years ago

@gsemet @jmidyet we appreciate your input and apologize if the lack of semver has caused issues. I am a huge fan of semver myself, but Arrow has been in flux the past few years as maintainers have come and gone (I just join the project back in May), but we are heavily discussing a move to semver.

@systemcatch and I discussed the potential move recently, but we wanted to keep a 1.0.0 release until after we have complete DST support implemented. That is one of the big lacking features of Arrow that we want to tackle before officially saying this is a non-beta product. 0.15.0 was focused on refactoring and fixing the parsing behavior, and now we are moving focus to DST. If anyone would like to help us implement DST and get the ball rolling on a 1.0.0 release, we'd be happy to help and look over PRs!

Also, 1.0.0 should probably be Python 3+.

andrewelkins commented 5 years ago

@jadchaar probably worth keeping support for 2.7 for 1.0.0