Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
254 stars 40 forks source link

Running `celltypist.annotate` with `min_prop` can't create "Heterogeneous" category #96

Closed DanScarc closed 7 months ago

DanScarc commented 8 months ago

Description

Please find a minimal example reproducing the error below.

Example

# Library imports
import scanpy as sc 
import celltypist # v. 1.6.1
from celltypist import models

# Data loading
adata = sc.datasets.pbmc3k()

# Adapt adata for compatibility with celltypist
adata_celltypist = adata.copy()
sc.pp.normalize_per_cell(
    adata_celltypist, counts_per_cell_after=10**4
)
sc.pp.log1p(adata_celltypist)
adata_celltypist.X = adata_celltypist.X.toarray()

# Dowload celltypist models
models.download_models(
    force_update=True, model=["Immune_All_Low.pkl"]
)
model_low = models.Model.load(model="Immune_All_Low.pkl")

# Predict cell types
predictions_low = celltypist.annotate(
    adata_celltypist, model=model_low, majority_voting=True, mode="best match", min_prop=0.7
)

Returns

File ~/miniforge3/envs/preprocessing/lib/python3.9/site-packages/celltypist/classifier.py:473, in Classifier.majority_vote(predictions, over_clustering, min_prop)
    471 majority = votes.idxmax(axis=0)
    472 freqs = (votes / votes.sum(axis=0).values).max(axis=0)
--> 473 majority[freqs < min_prop] = 'Heterogeneous'
    474 majority = majority[over_clustering].reset_index()
    475 majority.index = predictions.predicted_labels.index
.
.
.

TypeError: Cannot setitem on a Categorical with a new category (Heterogeneous), set the categories first

Environment

My current environment is:

name: preprocessing
channels:
  - bioconda
  - conda-forge
dependencies:
  - conda-forge::jupyterlab=3.5.0
  - conda-forge::leidenalg=0.9.1
  - conda-forge::numba=0.56.4
  - conda-forge::joypy
  - conda-forge::python=3.9.15
  - conda-forge::r-base=4.1.3
  - conda-forge::r-soupx=1.6.1
  - conda-forge::r-sctransform=0.3.3
  - conda-forge::r-glmpca=0.2.0
  - conda-forge::rpy2=3.5.11
  - conda-forge::scanpy=1.9.3
  - conda-forge::session-info=1.0.0
  - bioconda::celltypist
  - bioconda::anndata2ri=1.1
  - bioconda::bioconductor-scdblfinder=1.8.0
  - bioconda::bioconductor-scry=1.6.0
  - bioconda::bioconductor-scran=1.22.1
  - bioconda::bioconductor-glmgampoi=1.6.0

Thank you in advance!

ChuanXu1 commented 8 months ago

@DanScarc, this should be caused by the new behavior of new versions of pandas that make the output of idxmax as categorical. You can try to downgrade your version of pandas, or use the newest version of celltypist (1.6.2) which should have fixed this issue.

ChuanXu1 commented 7 months ago

This should have been fixed. Please reopen the issue if you have further questions.