Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
260 stars 40 forks source link

Plot a celltypist.dotplot to visualise celltypist's classification using a probability threshold #33

Closed sanchezy closed 1 year ago

sanchezy commented 1 year ago

Hi all, I am trying to visualise the results of the classification using a probability threshold and majority of voting on a cell typist.dotplot. I get an error:

Traceback (most recent call last):
  File "celltypist-scRNA-test.py", line 66, in <module>
    celltypist.dotplot(predictions, use_as_reference = 'predicted.celltype.l2', use_as_prediction = 'majority_voting', save ='scRNA-test-celltypist-probabilistic-majority_voting.png')
  File "/Users/ysanchez/opt/anaconda3/envs/transcriptomicsconda/lib/python3.8/site-packages/celltypist/plot.py", line 140, in dotplot
    dot_size_df, dot_color_df = _get_fraction_prob_df(predictions, use_as_reference, use_as_prediction, None, None)
  File "/Users/ysanchez/opt/anaconda3/envs/transcriptomicsconda/lib/python3.8/site-packages/celltypist/plot.py", line 33, in _get_fraction_prob_df
    score = [row[pred[index]] for index, row in predictions.probability_matrix.iterrows()]
  File "/Users/ysanchez/opt/anaconda3/envs/transcriptomicsconda/lib/python3.8/site-packages/celltypist/plot.py", line 33, in <listcomp>
    score = [row[pred[index]] for index, row in predictions.probability_matrix.iterrows()]
  File "/Users/ysanchez/opt/anaconda3/envs/transcriptomicsconda/lib/python3.8/site-packages/pandas/core/series.py", line 851, in __getitem__
    return self._get_value(key)
  File "/Users/ysanchez/opt/anaconda3/envs/transcriptomicsconda/lib/python3.8/site-packages/pandas/core/series.py", line 959, in _get_value
    loc = self.index.get_loc(label)
  File "/Users/ysanchez/opt/anaconda3/envs/transcriptomicsconda/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'Unassigned'

Could you please let me know if there is a way to get around this?

Many thanks for your help!

ChuanXu1 commented 1 year ago

@sanchezy, this plot function is now designed only for best-match mode - I will expand its application to 'prob match' in the future.

Generally you should avoid dot plotting the result derived from probability thresholding, as the dot plot itself incorporates probability information, based on which you can judge cell type prediction confidence.

However, if you really want to plot the probability threshold result, for now the temporary solution is as below: all_celltypes = predictions.predicted_labels.predicted_labels.cat.categories new_celltypes = all_celltypes.difference(predictions.probability_matrix.columns) predictions.probability_matrix[new_celltypes] = np.repeat(predictions.probability_matrix.max(axis=1).values[:, np.newaxis], len(new_celltypes), axis=1) then apply the dot plot function on predictions

sanchezy commented 1 year ago

Thanks a lot! It works!

ChuanXu1 commented 1 year ago

I have added support for this kind of plot. This will be available >= CellTypist 1.2.0