captainnova / dmri_segmenter

Skull stripper for diffusion MRI
Other
2 stars 1 forks source link

[BUG] [dmri_segmenter] RFC_classifier.pickle is incompatible with recent versions of scikit-learn. #1

Closed captainnova closed 6 months ago

captainnova commented 3 years ago

With python 3.8 I currently see:

Treating this as a non-FLAIR scan as instructed.
Traceback (most recent call last):
  File "git/dmri_segmenter/brine.py", line 35, in debrine
    obj = pickle.load(f, encoding=encoding)  # Python 3
ModuleNotFoundError: No module named 'sklearn.ensemble.forest'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "git/bin/skullstrip_dmri", line 94, in <module>
    brain, tiv = get_dmri_brain_and_tiv(ecnii.get_fdata(), ecnii, brfn=args.get('--brfn'),
  File "git/dmri_segmenter/dmri_brain_extractor.py", line 224, in get_dmri_brain_and_tiv
    mask, csfmask, other, submsg = feature_vector_classify(data, aff, bvals, clf=svc)
  File "git/dmri_segmenter/dmri_brain_extractor.py", line 1112, in feature_vector_classify
    clf = brine.debrine(clffn)
  File "git/dmri_segmenter/brine.py", line 37, in debrine
    obj = pickle.load(f)   # Python 2
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc6 in position 2: ordinal not in range(128)

RFC_classfier.pickle was made with python 2, and obviously an older version of sklearn. In late 2019 sklearn made a bunch of things, including sklearn.ensemble.forest, private (._forest) in https://github.com/scikit-learn/scikit-learn/issues/9250, despite knowing that it would break pickles (https://github.com/scikit-learn/scikit-learn/issues/12927).

brine then gets the ModuleNotFoundError and misinterprets it as a python 3 vs. 2 error, so the UnicodeDecodeError is misleading and comes from assuming that an unpickling error would be a py3 vs 2 thing. "Assuming" might be too harsh - it's trying to recover from a bad situation, and sometimes it works.

Ideally the classifier weights would be loaded as the data they are, in an inert format like HDF5, instead of a pickle, to avoid these problems.

captainnova commented 3 years ago

Using onnx instead of pickle for I/O of the classifier should help. See the onnx branch, which should support either pickles or onnx, but has onnx as a dependency.

captainnova commented 6 months ago

The onnx branch solves the problem and has been incorporated into main.