arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.63k stars 669 forks source link

Add Cython Support to Arrow #1143

Open 13MK3 opened 1 year ago

13MK3 commented 1 year ago

Pull Request Checklist

Thank you for taking the time to improve Arrow! Before submitting your pull request, please check all appropriate boxes:

If you have any questions about your code changes or any of the points above, please submit your questions along with the pull request and we will try our best to help!

Description of Changes

These changes were made in collaboration with @khanm3

This pull request adds Cython support to Arrow as requested by #617.

Further Explanation

In arrow/arrow.py, we had to change the variable "locale" to "locale_cls" because the variable was being reused for 2 different types and this functionality was incompatible with Cython. Cython also had difficulties recognizing the type of _datetime.isoclanedar, so we had to specifically cast it to a tuple. Without this, it would not compile.

Even with the issues mentioned above, we chose to use Cython 3.0.0a11 (which is a pre-release version) instead of Cython 0.29.32 (the most recent stable version), since from our testing, Cython 3 is better at parsing modern Python features. In general, Cython tries to subsume the functionality of python, but as you can see there are still many quirks.

We were able to allow Cython to use the existing method of calculating code coverage by adding linetracing as a compiler directive. However, due to the quirks of Cython, the resulting line coverage results are slightly lower than 100%. We found that the lines that were no longer being covered were one of 2 cases: 1) Inside an if statement that had 'pragma no cover' (so those also shouldn't count towards coverage) 2) Parameters to functions that had a default value (which also shouldn't count towards coverage, but for some reason Cython thinks it should)

This could be ignored by adding more 'pragma no cover', but we're not sure if there is a better way.

Performance Analysis

To compare performance of Arrow with and without Cython, we ran just the Arrow part of this benchmark. This benchmark runs 1 million executions per benchmark and picked the min of 5 repeats. This was run in Python 3.8.10.

arrow_benchmark

From these results, you can see that compiling Arrow with Cython does cause a measureable improvement to performance, especially with parsing. Additionally, compiling Cython without linetracing is even more performant, but we currently compile with it so that code coverage can be calculated.

Future Work

Adding static typing (and necessarily converting those files to .pyx files) would bring an even bigger boost to performance. Parser.py should receive special attention, as this is where the largest amount of time is being spent. This is where adding Cython to Arrow could prove to be especially valuable, since the speed increase with static typing should be much more than that gained by simply compiling with Cython. Looking into different ways of calculating code coverage so linetracing is no longer needed, or providing an option to switch the compiler directive would also be a good way to improve performance in the future.

krisfremen commented 1 year ago

Very nice, i had a wip branch of this, but this looks like good progress.

13MK3 commented 1 year ago

We realized the system tests are failing because the test systems aren't installing Cython before running setup.py. It seems that the continuous_integration.yml is using cached requirements and therefore not installing Cython even though we specified it in requirements.txt? We are not exactly sure and could use some help with this. Cython seems to install fine when we do "make build3x" locally.

krisfremen commented 1 year ago

tox handles setup.py dependencies a bit differently, which we install during the GH Actions.

We should probably extract those out of the workflow and in a separate requirements file.

codecov[bot] commented 1 year ago

Codecov Report

Base: 100.00% // Head: 99.59% // Decreases project coverage by -0.40% :warning:

Coverage data is based on head (3d416be) compared to base (74a759b). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1143 +/- ## =========================================== - Coverage 100.00% 99.59% -0.41% =========================================== Files 9 9 Lines 2319 2684 +365 Branches 492 0 -492 =========================================== + Hits 2319 2673 +354 - Misses 0 11 +11 ``` | [Impacted Files](https://codecov.io/gh/arrow-py/arrow/pull/1143?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py) | Coverage Δ | | |---|---|---| | [arrow/arrow.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvYXJyb3cucHk=) | `99.54% <100.00%> (-0.46%)` | :arrow_down: | | [arrow/constants.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvY29uc3RhbnRzLnB5) | `88.23% <0.00%> (-11.77%)` | :arrow_down: | | [arrow/util.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvdXRpbC5weQ==) | `97.95% <0.00%> (-2.05%)` | :arrow_down: | | [arrow/formatter.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvZm9ybWF0dGVyLnB5) | `99.03% <0.00%> (-0.97%)` | :arrow_down: | | [arrow/parser.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvcGFyc2VyLnB5) | `99.43% <0.00%> (-0.57%)` | :arrow_down: | | [arrow/locales.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvbG9jYWxlcy5weQ==) | `99.85% <0.00%> (-0.15%)` | :arrow_down: | | [arrow/api.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvYXBpLnB5) | `100.00% <0.00%> (ø)` | | | [arrow/factory.py](https://codecov.io/gh/arrow-py/arrow/pull/1143/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py#diff-YXJyb3cvZmFjdG9yeS5weQ==) | `100.00% <0.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=arrow-py)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

krisfremen commented 1 year ago

@13MK3 I see 3.8+ is passing, the rest are failing. Great work.

@anishnya @jadchaar @systemcatch how do we want to handle this going forward? do we want to go full cython or offer a pure python version as well?