Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
278 stars 44 forks source link

0 features used for prediction #15

Closed bbimber closed 2 years ago

bbimber commented 2 years ago

We are running celltypist and ran into a fringe case. I think the error is somewhere upstream in our data preparation, but running celltypist gives this:

⏳ Loading data 🔬 Input data has 2700 cells and 13714 genes 🔗 Matching reference genes in the model 🧬 0 features used for prediction ⚖️ Scaling input data

and then it dies with the stacktrace below. Obviously if no features match, celltypist cant run. I'm reporting this since it might be nice if the celltypist code that finds matching features would die more immediately and in a more informative way if zero features match (or perhaps fewer than some configurable threshold).

I'm confused about this particular case since the failure is from our automated tests, and the input data is the pbmc3k dataset from Seurat, downloaded using the SeuratData R package. This test runs fine on R/release, but not R/develop. In any event, i dont think the failure itself is celltypist's issue.

Traceback (most recent call last):
  File "/home/runner/.local/bin/celltypist", line 8, in <module>
    sys.exit(main())
  File "/home/runner/.local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/runner/.local/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/runner/.local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/runner/.local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/runner/.local/lib/python3.8/site-packages/celltypist/command_line.py", line 109, in main
    result = annotate(
  File "/home/runner/.local/lib/python3.8/site-packages/celltypist/annotate.py", line 81, in annotate
    predictions = clf.celltype(mode = mode, p_thres = p_thres)
  File "/home/runner/.local/lib/python3.8/site-packages/celltypist/classifier.py", line 351, in celltype
    decision_mat, prob_mat, lab = self.model.predict_labels_and_prob(self.indata, mode = mode, p_thres = p_thres)
  File "/home/runner/.local/lib/python3.8/site-packages/celltypist/models.py", line 120, in predict_labels_and_prob
    scores = self.classifier.decision_function(indata)
  File "/home/runner/.local/lib/python3.8/site-packages/sklearn/linear_model/_base.py", line 407, in decision_function
    X = self._validate_data(X, accept_sparse="csr", reset=False)
  File "/home/runner/.local/lib/python3.8/site-packages/sklearn/base.py", line 566, in _validate_data
    X = check_array(X, **check_params)
  File "/home/runner/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 814, in check_array
    raise ValueError(
ValueError: Found array with 0 feature(s) (shape=(2700, 0)) while a minimum of 1 is required.
ChuanXu1 commented 2 years ago

@bbimber, thx for this suggestion! 8d210b5ccba404397cf4230a85a4bf2bf020275f