Closed tfardet closed 7 months ago
Apparently I need to add pyarrow as a dependency for parquet to run (I thought it was installed by default with pandas now, but apparently not), let's see if it runs fine this time. @hadrilec I hope the additional dependency is OK with you, I really think parquet is the way forward in terms of storing data (both to save space and fast IO).
Failing tests do not seem related to the PR @hadrilec do you know why the SIRENE tests may be failing?
EDIT: I confirm that tests fail locally on master too
@hadrilec you can have a look, tests are passing now that SIRENE is fixed
@hadrilec let me know if you need further info to validate that PR
@tfardet thanks a lot, do you think it would be possible to move pyarrow to the optional packages? meaning to the extras_require list in setup.py file. In doing so, we would allow the users to use pyarrow if they want and it would not add a "hard" dependency on pyarrow. if this too much trouble, let me know
do you think it would be possible to move pyarrow to the optional packages?
This is theoretically possible but I would strongly advise against it: this would require two different codes to handle file for people with and without pyarrow and making pyarrow non-default would mean that most people would not get the benefits from the new parquet format...
Is there a specific issue with pyarrow that you are aware of? As far as I know they provide almost every wheel so installation should never be an issue...
ok let's go
Use parquet instead of pickle to store datasets (around 100 times faster), should be supported since pandas 0.21 and we require 0.24.
Move to platformdirs for cache folder, etc, as appdirs is unmaintained.