bedapub / besca

BESCA (Beyond Single Cell Analysis) offers python functions for single-cell analysis
https://bedapub.github.io/besca/
GNU General Public License v3.0
49 stars 16 forks source link

Function to convert dblabel to shortnames and vis #138

Closed llumdi closed 2 years ago

llumdi commented 3 years ago

Would be nice to have a function to convert the column names (therefore category labels on a plot) from dblabel to short name and viceversa. For plotting is more convenient to have short names but it could print a table with the conversion (dblabel and short name columns) so that it can be used as a glossary attached to the report.

ajulienla commented 2 years ago

Was fixed with bc.tl.sig.obtain_dblabel

llumdi commented 2 years ago

Sorry, I did not explain it correctly. I was not refereing to the conversion that already exists in bc.tl.sig.obtain_dblabel and which is used during the celltype annotation notebook that requires cnames and a dataframe. I was referring to the ability of converting the dblabel column in adata.obs (or any column name given valid dblabels) to either the short or long name version (as in bescaviz).

Eg: If I load a study I have these names:

set(adata.obs['dblabel'])

{'CD1c-positive myeloid dendritic cell',
 'CD4-positive, alpha-beta cytotoxic T cell',
 'CD8-positive, alpha-beta cytotoxic T cell',
 'basophil',
 'central memory CD4-positive, alpha-beta T cell',
 'classical monocyte',
 'cytotoxic CD56-dim natural killer cell',
 'doublet',
 'effector memory CD4-positive, alpha-beta T cell',
 'effector memory CD8-positive, alpha-beta T cell',
 'gamma delta T cell',
 'hematopoietic stem cell',
 'monocyte',
 'mucosal invariant T cell',
 'naive B cell',
 'naive thymus-derived CD4-positive, alpha-beta T cell',
 'naive thymus-derived CD8-positive, alpha-beta T cell',
 'neutrophil',
 'non-classical monocyte',
 'plasma cell',
 'plasmacytoid dendritic cell',
 'platelet',
 'regulatory T cell'}

But I would like to plot the short version names. How can I do that? Thanks, Ll

ajulienla commented 2 years ago

Ok. I ll have a look. Thank you for pointing it out.

ajulienla commented 2 years ago

Hi @llumdi , could you check the last commit on the signature_revision_branch ? And tell me if this is what you had in mind ?

As a ECM, once the commit (https://github.com/bedapub/besca/commit/682288c686eddfe0964f3b333e2dc425a3a9340b) is checked out:

import scanpy as sc
import besca as bc

adata = bc.datasets.pbmc3k_processed()
sc.pl.umap(adata, color = 'celltype3')

matching_v = bc.tl.sig.match_label(adata.obs.get( "celltype3"),  '../' + "/besca/datasets/nomenclature/CellTypes_v1.tsv")

adata.obs['short'] = adata.obs.get( "celltype3").map( dict(matching_v.values))
sc.pl.umap(adata, color = 'short')

## Checking out the conversion table
matching_v
llumdi commented 2 years ago

Thanks for implementing the new function Alice. I tested it and works perfectly (I will send you the report with different scenarios). Just a small suggestion: only return as an error the not found category instead of the list of all values, which can be very big.

llumdi commented 2 years ago

Just to confirm that from the development version (signature_revision) it also worked calling directly the function bc.tl.sig.match_label( ) instead of copying the code.

ajulienla commented 2 years ago

@llumdi , printing issue should be fixed in a8052d7.