ToucanToco / fastexcel

A Python wrapper around calamine
http://fastexcel.toucantoco.dev/
MIT License
102 stars 6 forks source link

feat: extend `fastexcel` support to python >= 3.8 #152

Closed alexander-beedie closed 8 months ago

alexander-beedie commented 8 months ago

Hoping that a patch to extend Python support is welcome :wink:

I was looking to integrate a calamine based Excel parser as one of our supported engines in the Polars read_excel method (I am one of the core devs for that project), but we need to support Python versions >= 3.8. While we could potentially have python_calamine handle earlier versions, I'm reluctant to do so as fastexcel is much better-suited for our use-case and benchmarks significantly faster (producing Arrow data without materialising anything in Python-space; nicely done!)

Took a look at the code to see if it could be extended to cover the earlier versions, and it seems like a straightforward low-impact update; I've validated that everything compiles/runs on the earlier Python versions (testing on x86 Ubuntu Linux for py3.8 and an Apple Silicon Mac for py3.9). Mostly just some minor typing updates and avoidance of the match expression in one of the tests. Put in an extra lint rule to catch the need for a from __future__ import annotations statement at the top of some files (to keep the modern typing syntax).

I can't validate the workflow changes, but with a little luck they'll run ok 🤞

Thanks for the great work, and let me know if you want any changes/modifications, etc?

alexander-beedie commented 8 months ago

Also, have you encountered any difficulties/questions when you looked at the code?

Actually I was pleasantly surprised how easy it was to go from "clone repo" to "everything is compiled and tests are running/passing" 👌 On that front I only have one comment, which is that I didn't immediately expect make dev-install and make prod-install to actually build, heh. Perhaps getting the word "build" in there would make it so even people who aren't fully paying attention (in this case: me!) wouldn't have to look twice - but honestly that's about the only thing I can think of, and it only took a few seconds to read through the Makefile with a bit more care 🤣

Otherwise it was smooth sailing; was up & running almost immediately. As for the source itself, I have only given it a superficial scan so far - mostly just looking to see what features are exposed (eg: is there a "has_headers" flag in case the sheet has no column names, an option to set them if there aren't, etc).

lukapeschke commented 8 months ago

I didn't immediately expect make dev-install and make prod-install to actually build

Good point, I'll update the make target names :)

Regarding the features you'd like to add, feel free to open issues and @ me on them! We've only exposed what we need internally for now, but things such as sheet headers handling should be pretty easy to add

lukapeschke commented 8 months ago

@alexander-beedie fastexcel 0.7.0 has been released and wheels for python 3.8 -> 3.12 have been successfully built and published to PyPI :partying_face: https://pypi.org/project/fastexcel/0.7.0/#files

alexander-beedie commented 8 months ago

feel free to open issues and @ me on them

Fantastic; thanks for the extremely quick turnaround on this! I've already integrated it as a new option for Polars in a local build and it's looking great. Will finish up some unit tests and look to push it out in the next day or two ✌️