Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
254 stars 40 forks source link

SGDClassifier / InvalidParameterError: The 'loss' parameter of SGDClassifier must be a str #75

Closed bbimber closed 11 months ago

bbimber commented 1 year ago

Hello,

We have a software test that uses celltypist's training, and it began to fail due to what I assume is some kind of update in celltypist or associated python packages. Have you seen an error like below? I assume the might be incompatible versions, and probably is this line:

https://github.com/Teichlab/celltypist/blob/164904d20a465d047ab94f86b8a38ca9d296f1e5/celltypist/train.py#L130

The code we run (which is a very basic celltypist.train command), and the error/stack are below. I also listed versions of what I assume are the relevant packages. All of these were installed from pip.

Have you see this error before?

# The code:
new_model = celltypist.train('/tmp/RtmpFGQEtl/filebce07f0788e4-seurat-annData.h5ad', labels = '/tmp/RtmpFGQEtl/filebce07f0788e4.seurat.labels.txt', use_SGD = False, solver = 'saga', feature_selection = True, top_genes = 300);
new_model.write('/home/runner/work/RIRA/RIRA/check/RIRA.Rcheck/tests/testthat/myModel.pkl');

# Warnings:
2023-07-10T18:51:33.1865691Z /home/runner/.local/lib/python3.8/site-packages/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
2023-07-10T18:51:33.1866686Z   @numba.jit()
2023-07-10T18:51:33.1868780Z /home/runner/.local/lib/python3.8/site-packages/umap/distances.py:1071: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
2023-07-10T18:51:33.1869877Z   @numba.jit()
2023-07-10T18:51:33.1871784Z /home/runner/.local/lib/python3.8/site-packages/umap/distances.py:1086: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
2023-07-10T18:51:33.1872745Z   @numba.jit()
2023-07-10T18:51:33.1874575Z /home/runner/.local/lib/python3.8/site-packages/umap/umap_.py:660: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
2023-07-10T18:51:33.1875536Z   @numba.jit()

# Output:
2023-07-10T18:51:33.1875988Z 🍳 Preparing data before training
2023-07-10T18:51:33.1876510Z ✂️ 1147 non-expressed genes are filtered out
2023-07-10T18:51:33.1877019Z 🔬 Input data has 458 cells and 12567 genes
2023-07-10T18:51:33.1877443Z ⚖️ Scaling input data
2023-07-10T18:51:33.1877919Z 🏋️ Training data using SGD logistic regression
2023-07-10T18:51:33.1878341Z Traceback (most recent call last):
2023-07-10T18:51:33.1878867Z   File "/tmp/RtmpFGQEtl/filebce07f0788e4.seurat.train.py", line 2, in <module>
2023-07-10T18:51:33.1880153Z     new_model = celltypist.train('/tmp/RtmpFGQEtl/filebce07f0788e4-seurat-annData.h5ad', labels = '/tmp/RtmpFGQEtl/filebce07f0788e4.seurat.labels.txt', use_SGD = False, solver = 'saga', feature_selection = True, top_genes = 300);
2023-07-10T18:51:33.1881145Z   File "/home/runner/.local/lib/python3.8/site-packages/celltypist/train.py", line 329, in train
2023-07-10T18:51:33.1882216Z     classifier = _SGDClassifier(indata = indata, labels = labels, alpha = alpha, max_iter = max_iter, n_jobs = n_jobs, mini_batch = mini_batch, batch_number = batch_number, batch_size = batch_size, epochs = epochs, balance_cell_type = balance_cell_type, **kwargs)
2023-07-10T18:51:33.1883180Z   File "/home/runner/.local/lib/python3.8/site-packages/celltypist/train.py", line 135, in _SGDClassifier
2023-07-10T18:51:33.1883671Z     classifier.fit(indata, labels)
2023-07-10T18:51:33.1884350Z   File "/home/runner/.local/lib/python3.8/site-packages/sklearn/base.py", line 1144, in wrapper
2023-07-10T18:51:33.1884826Z     estimator._validate_params()
2023-07-10T18:51:33.1885505Z   File "/home/runner/.local/lib/python3.8/site-packages/sklearn/base.py", line 637, in _validate_params
2023-07-10T18:51:33.1886013Z     validate_parameter_constraints(
2023-07-10T18:51:33.1886849Z   File "/home/runner/.local/lib/python3.8/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
2023-07-10T18:51:33.1887410Z     raise InvalidParameterError(
2023-07-10T18:51:33.1888742Z sklearn.utils._param_validation.InvalidParameterError: The 'loss' parameter of SGDClassifier must be a str among {'modified_huber', 'huber', 'squared_epsilon_insensitive', 'log_loss', 'hinge', 'squared_hinge', 'perceptron', 'squared_error', 'epsilon_insensitive'}. Got 'log' instead.

#Versions:
celltypist-1.5.3
scikit_learn-1.3.0
scanpy-1.9.3
numba-0.57.1
prete commented 1 year ago

Deprecated since version 1.1: The loss ‘log’ was deprecated in v1.1 and will be removed in version 1.3. Use loss='log_loss' which is equivalent

Looks like the latest version of scikit-learn (1.3.0) doesn't support loss 'log'. Untill this has been patched on celltypist you'll have to use the previous version of scikit-learn:

pip install -U scikit-learn==1.2.2
ChuanXu1 commented 11 months ago

This should be solved in the new version (1.6.0)