Update load_data.py to work with sklearn.preprocessing.OneHotEncoder v1.2 updates

braindynamicslab / dyneusr-notebooks

DyNeuSR notebooks and tutorials

https://braindynamicslab.github.io/dyneusr/

4 stars 4 forks source link

Update load_data.py to work with sklearn.preprocessing.OneHotEncoder v1.2 updates #5

Closed hokifung closed 7 months ago

hokifung commented 7 months ago

New in scikit-learn version >= 1.2 : OneHotEncoder parameter sparse was renamed to sparse_output. Updated line encoder = OneHotEncoder(handle_unknown='ignore', sparse=False) in load_data.py.

calebgeniesse commented 7 months ago

Hi @hokifung, thanks for fixing this. I wonder if we can do this in a way that's compatible with older versions of scikit-learn?

A simple way to do this might be to wrap it with try and except, e.g.,

try: 
    # scikit-learn version >= 1.2
    encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
except:
    # scikit-learn version < 1.2
    encoder = OneHotEncoder(handle_unknown='ignore', sparse=False)

Let me know what you think or if you have any better ideas?

hokifung commented 7 months ago

Hi @calebgeniesse, Yes for sure! It was just a quick fix to bring the issue to your attention - it would work for most new users as the version of scikit-learn was not specified in the requirement files, but definitely a great idea to make the update more compatible! You could also consider specifying the scikit-learn version of you used in requirement.txt and keeping the original script, but I think your idea of using try and except is good.

calebgeniesse commented 7 months ago

Yeah good point about just specifying the version. My only hesitation is that I need to make sure dyneusr works with later versions of scikit-learn, too. It should, but I just don't have the time to fully test this out right now. I'll make the change we discussed (using try and except) for now, and then maybe revisit this if other issues come up (i.e., if there are other places where dyneusr assumes an older version of scikit-learn).

Anyways, thanks for helping fix this and for the discussion! I'll make the change and merge the PR.

calebgeniesse commented 7 months ago

Note, I changed the except line to only catch TypeError, which should be what's thrown for "an unexpected keyword argument", but let me know if you were getting a different error. Interestingly, I tested this out with sklearn==1.3.0 and it seems they now accept both sparse and sparse_outputs, but I'm glad we added this extra check to ensure backwards compatibility, for now.

Thanks again, @hokifung! I'll merge this now.

hokifung commented 7 months ago

@calebgeniesse Sounds good and totally understandable! Sorry I also don't have time to test this out further (I was in Dr. Saggar's workshop the other day and was trying to get the notebook to work during that hour, hence the quick fix). Very cool to know sklearn==1.3.0 now accept both sparse and sparse_outputs! Just for your reference, I was using sklearn==1.4.1.post1. Thanks!

calebgeniesse commented 7 months ago

ok maybe I am wrong about sklearn==1.3.0 then, but it worked on my local machine 😅

and no worries at all, I really appreciate you sharing the fix!