dask / fastparquet

python implementation of the parquet columnar file format.
Apache License 2.0
786 stars 178 forks source link

Linting in fastparquet? #939

Open yohplala opened 2 weeks ago

yohplala commented 2 weeks ago

Hello, Could we have a linting tool setup (maybe through pre-commit) in fastparquet? We tackled this topic in issue #720 . At this time, @martindurant , you mentionned:

It would be reasonable to require some standard linting (flake8, black, isort) via pre-commit or otherwise. Obviously, implementing this would change most of the function blocks and make all current PRs very hard to merge - so the timing of formatting introduction is very sensitive.

I raise this topic as I have in mind that you are working on "significant" changes in fastparquet through #931 . Can this milestone be identified as a "fastparquet 2.0"? (there has been been several 2.0" release "recently": numpy, pandas mostly) If yes, do you think, it could be an appropriate timing to introduce linting?

yohplala commented 2 weeks ago

Side comment:

Maybe it could be relevant to have a "dev" branch, to which contributors can issue pull requests, without linting (this decreases the constraint for punctual contributors to contribute)

Then the linting step could be setup only when releasing "dev" branch into "main" branch?

martindurant commented 2 weeks ago

On the prospect of fastparquet 2: I have indeed put significant work into the branch. However, I am now hoping that https://kylebarron.dev/arro3/ will solve the use case for us ( https://github.com/kylebarron/arro3/issues/195 - discussion). If this can be done in rust with solid upstream implementations for all the parquet peculiarities, then maybe the cython in this project becomes unnecessary and can be dropped. I could, instead, work at making a nicer API and convenience functions in arro3.

About linting: sure, it would be fine. There are not too many things in motion, so it could be done any time (the fastparquet-numpy branch, if it progresses, can apply it later). Ruff seems to have overtaken the other linters in the meantime.

yohplala commented 2 weeks ago

I didn't know about arro3, thanks for the info! I will be watching it, let's see how this evolves.