Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
260 stars 40 forks source link

Training new model gives ValueError #48

Closed AmichayAfriat closed 1 year ago

AmichayAfriat commented 1 year ago

Hi, I'm trying to train my own model and keep getting this error:

🍳 Preparing data before training
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [37], line 1
----> 1 coarse_model = celltypist.train(GAnn, labels = 'annot_coars', n_jobs = 10, feature_selection = True)

File ~\anaconda3\envs\GDT2\lib\site-packages\celltypist\train.py:293, in train(X, labels, genes, transpose_input, with_mean, check_expression, C, solver, max_iter, n_jobs, use_SGD, alpha, mini_batch, batch_number, batch_size, epochs, balance_cell_type, feature_selection, top_genes, date, details, url, source, version, **kwargs)
    291 #check
    292 if check_expression and (np.abs(np.expm1(indata[0]).sum()-10000) > 1):
--> 293     raise ValueError(
    294             "🛑 Invalid expression matrix, expect log1p normalized expression to 10000 counts per cell")
    295 if len(labels) != indata.shape[0]:
    296     raise ValueError(
    297             f"🛑 Length of training labels ({len(labels)}) does not match the number of input cells ({indata.shape[0]})")

ValueError: 🛑 Invalid expression matrix, expect log1p normalized expression to 10000 counts per cell

my anndata is simple raw counts and obs metadata:

AnnData object with n_obs × n_vars = 185894 × 31053
    obs: 'annot_coars', 'annot_fine'

I tried also running sc.pp.log1p() prior to the train function (though that should be done under the hood, no?) but nothing changes.

training on the demo adata_2000 works just fine.

Thanks!

AmichayAfriat commented 1 year ago

nvm scaling was done on different version of anndata.