AnotherSamWilson / miceforest

Multiple Imputation with LightGBM in Python
MIT License
353 stars 31 forks source link

Updated dependency blosc to blosc2 #86

Closed franzhuettinger closed 6 months ago

franzhuettinger commented 7 months ago

I have only updated the dependency to blosc and changed it from blosc to blosc2. The reason for this is that blosc is no longer compatible with latest libraries and also some changes regarding NumPy were not up to date. Please feel free to add these changes to the project.

AnotherSamWilson commented 7 months ago

My man, it looks like this allows us to save objects over 2^32 bytes also. Huge improvement. Thanks.

AnotherSamWilson commented 7 months ago

@franzhuettinger tests failed, it looks like blosc wasn't updated in utils.

franzhuettinger commented 7 months ago

I forgot to update the github/workflow file. Please try again.

AnotherSamWilson commented 6 months ago

@franzhuettinger it looks like it still isn't updated in utils.

franzhuettinger commented 6 months ago

I do not see the problem at the moment but it's late today. I'll check it tomorrow.

franzhuettinger commented 6 months ago

In some places blosc was still used instead of blosc2. I have found these places and repaired them. I hope to have found all the places now.

AnotherSamWilson commented 6 months ago

@franzhuettinger hmmm it looks like the blosc2 API changed somewhat, there is no NOSHUFFLE option anymore. BTW, to save ourselves having to test on github, you can test locally if you install pytest and run pytest in your terminal.

franzhuettinger commented 6 months ago

I had some trouble setting up pytest on my local machine but in between I found the problem. Next commit will be tested properly.

franzhuettinger commented 6 months ago

blosc2 does not seem to be 100% compatible with blosc. I had to add a compatibility layer because the codec is no longer specified as a string in blosc.compress() but as blosc2.Codec. I also had to remove the typesize in blosc2.compress(), because the result of dill.dumps() does not divide by typesize. Another necessary change was to replace the parameter shuffel with filter.

franzhuettinger commented 6 months ago

Still there seems to be a problem with python 3.7 and 3.8. Haven't seen this problem on my machine. Will try to find the problem tomorrow.

franzhuettinger commented 6 months ago

On my system, I was able to successfully complete the tests with Python 3.12, 3.9 and 3.8. I had no success with Python 3.7. Blosc2 requires at least Python 3.10 but up to version 3.8 it still seems to work without any problems.

image

If you want to maintain support for Python 3.7, then we will probably not be able to solve the problem easily. My recommendation would be to increase the minimum requirement to 3.9.

AnotherSamWilson commented 6 months ago

That's all fine with me. I've noticed many new package versions are removing support for 3.7.