Move to parquet and platformdirs

InseeFrLab / pynsee

pynsee package contains tools to easily search and download french data from INSEE and IGN APIs

https://pynsee.readthedocs.io/en/latest/

MIT License

70 stars 10 forks source link

Move to parquet and platformdirs #192

Closed tfardet closed 7 months ago

tfardet commented 7 months ago

Use parquet instead of pickle to store datasets (around 100 times faster), should be supported since pandas 0.21 and we require 0.24.

Move to platformdirs for cache folder, etc, as appdirs is unmaintained.

tfardet commented 7 months ago

Apparently I need to add pyarrow as a dependency for parquet to run (I thought it was installed by default with pandas now, but apparently not), let's see if it runs fine this time. @hadrilec I hope the additional dependency is OK with you, I really think parquet is the way forward in terms of storing data (both to save space and fast IO).

tfardet commented 7 months ago

Failing tests do not seem related to the PR @hadrilec do you know why the SIRENE tests may be failing?

EDIT: I confirm that tests fail locally on master too

tfardet commented 7 months ago

@hadrilec you can have a look, tests are passing now that SIRENE is fixed

tfardet commented 7 months ago

@hadrilec let me know if you need further info to validate that PR

hadrilec commented 7 months ago

@tfardet thanks a lot, do you think it would be possible to move pyarrow to the optional packages? meaning to the extras_require list in setup.py file. In doing so, we would allow the users to use pyarrow if they want and it would not add a "hard" dependency on pyarrow. if this too much trouble, let me know

tfardet commented 7 months ago

do you think it would be possible to move pyarrow to the optional packages?

This is theoretically possible but I would strongly advise against it: this would require two different codes to handle file for people with and without pyarrow and making pyarrow non-default would mean that most people would not get the benefits from the new parquet format...

Is there a specific issue with pyarrow that you are aware of? As far as I know they provide almost every wheel so installation should never be an issue...

hadrilec commented 7 months ago

ok let's go