arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.71k stars 673 forks source link

Refactor dehumanize and begin standardizing locales #970

Closed anishnya closed 3 years ago

anishnya commented 3 years ago

Pull Request Checklist

Thank you for taking the time to improve Arrow! Before submitting your pull request, please check all appropriate boxes:

If you have any questions about your code changes or any of the points above, please submit your questions along with the pull request and we will try our best to help!

Description of Changes

Overview

As a follow-up #961, I've added support for 5 more locales: Arabic, Croatian, Estonian, Hebrew, and Hungarian.

The template for additional locale support is similar to how Hebrew and Arabic have been implemented. Within our timeframe objects, for any special timeframes, we use a (string) key that represents an integer. That way, in dehumanize we can also convert this string to an int and parse it accordingly.

Rationale and Potential Alternatives

I prefer the method presented here. It cleans up locales dramatically and creates standardization across locales. Currently, we have a mishmash of different methods of how locales are implemented. For example, many of the Slavic locales index into a list, whereas others use some combination of a dictionary and list. I think having a timeframe object that has the value of either a string or dictionary (with only a string key and value) will make these locales much more maintainable in the long run.

For example, with this new approach, we were able to eliminate the need for the "2-" literals that were only used by Hebrew. With this approach, there is very little modification needed to tackle the outstanding locales that do not support dehumanize. On top of that, this approach creates a standard for locale implementation that I think we all desire.

Granted, having a key be the string representation of an int isn't necessarily pretty, but in my opinion, it is much more elegant than other possible solutions. Another possible solution was to create an additional object within a locale that would essentially translate a given special timeframe string into its associated information (see below).

special_timeframes = {"string for two months ago": ["months", 2, True], "{0} special plural unit": ["seconds", 1, False]}

I think this alternative approach is somewhat clunky and adds an object that would only be used for dehumanize to every locale. Also, this approach doesn't force us to refactor legacy locales, the way the first approach does. After spending time with a lot of these locales, it is clear we do need to refactor some legacy locales, so we might as well tackle it while we add support for dehumanize.

codecov[bot] commented 3 years ago

Codecov Report

Merging #970 (1c367cc) into master (06fe693) will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #970   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines         2036      2038    +2     
  Branches       328       330    +2     
=========================================
+ Hits          2036      2038    +2     
Impacted Files Coverage Δ
arrow/constants.py 100.00% <ø> (ø)
arrow/arrow.py 100.00% <100.00%> (ø)
arrow/locales.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 06fe693...1c367cc. Read the comment docs.