Plastic-Scanner / PSplot

A lightweight tool for obtaining and visualising the discrete near-infrared (NIR) data using the Plastic Scanner
GNU General Public License v3.0
7 stars 5 forks source link

Fix pipenv installation #36

Closed gillens closed 10 months ago

gillens commented 10 months ago

Adds scikit-learn as a dependency and requires Python 3.10.

Scikit-learn

Scikit-learn is used by the file resources/model.joblib, so I got this error when I first ran the project after running pipenv install and then pipenv shell:

scikit-learn include error ``` $ python psplot.py App is running on QT version 5.15.2 Traceback (most recent call last): File "/home/sean/cs/PSplot/psplot.py", line 663, in main() File "/home/sean/cs/PSplot/psplot.py", line 657, in main window = PsPlot() File "/home/sean/cs/PSplot/psplot.py", line 59, in __init__ self._setup_variables() File "/home/sean/cs/PSplot/psplot.py", line 78, in _setup_variables self.clf = joblib.load("./resources/model.joblib") File "/home/sean/.local/share/virtualenvs/PSplot-ZEt-0E32/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load obj = _unpickle(fobj, filename, mmap_mode) File "/home/sean/.local/share/virtualenvs/PSplot-ZEt-0E32/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle obj = unpickler.load() File "/usr/lib/python3.10/pickle.py", line 1213, in load dispatch[key[0]](self) File "/usr/lib/python3.10/pickle.py", line 1538, in load_stack_global self.append(self.find_class(module, name)) File "/usr/lib/python3.10/pickle.py", line 1580, in find_class __import__(module, level=0) ModuleNotFoundError: No module named 'sklearn' ```

I tried installing the latest scikit-learn, but then got a warning about the RandomForestClassifier in the joblib file using version 1.0.2 and then psplot crashed again:

scikit-learn version error ``` $ python psplot.py App is running on QT version 5.15.2 /home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/sklearn/base.py:348: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0.2 when using version 1.3.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( Traceback (most recent call last): File "/home/sean/PSplot/psplot.py", line 663, in main() File "/home/sean/PSplot/psplot.py", line 657, in main window = PsPlot() File "/home/sean/PSplot/psplot.py", line 59, in __init__ self._setup_variables() File "/home/sean/PSplot/psplot.py", line 78, in _setup_variables self.clf = joblib.load("./resources/model.joblib") File "/home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load obj = _unpickle(fobj, filename, mmap_mode) File "/home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle obj = unpickler.load() File "/usr/lib/python3.10/pickle.py", line 1213, in load dispatch[key[0]](self) File "/home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 402, in load_build Unpickler.load_build(self) File "/usr/lib/python3.10/pickle.py", line 1718, in load_build setstate(state) File "sklearn/tree/_tree.pyx", line 728, in sklearn.tree._tree.Tree.__setstate__ File "sklearn/tree/_tree.pyx", line 1434, in sklearn.tree._tree._check_node_ndarray ValueError: node array from the pickle has an incompatible dtype: - expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['

I set the Pipfile version to 1.0.2, and it works. This old scikit-learn is using a deprecated feature though so it generates warnings. At some point I can update the model file to use a recent version, but I'm not yet sure how to reproduce the classifier.

Use Python 3.10, not 3.8

Since 3.8 is specified in the Pipfile, pipenv install will pause for a while looking for a 3.8 install on the disk. If it is installed or the user tries to install it, running the project does not work because it uses dictionary merging with |, introduced in 3.9. Therefore just setting version to 3.10 matching the other instructions.

Pipfile.lock

Probably worth committing the Pipfile.lock file, for more deterministic installation, also should make it a bit quicker. It keeps track of the package versions, so users know they are using the same PSplot as the developers. Recommended by the pipenv docs:

Keep both Pipfile and Pipfile.lock in version control.

I can add that change here if others agree. I know the desire is to package PSplot so users don't have to use pipenv at all (#13), but this should help move toward that.