HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
66 stars 31 forks source link

Import celltype annotations #390

Closed MDHowe4 closed 5 months ago

MDHowe4 commented 8 months ago

Hello,

In my case I have already annotated cell types outside of CATALYST using Maxpar Pathsetter. I can import this data into CATALYST and can access it through in the cell_id column.

sce = prepData(fcs, panel, mdmod_pc, features = panel$fcs_colname)

sce@int_colData
> sce@int_colData
DataFrame with 192797 rows and 10 columns
                 reducedDims     altExps    colPairs      Time Event_length    Center    Offset     Width  Residual   cell_id
                 <DataFrame> <DataFrame> <DataFrame> <numeric>    <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
1                NA:NA:NA:NA                           3.23431      25.3861  1130.605  123.2615   91.3725  151.4710         2
2                NA:NA:NA:NA                          23.44609      22.5838  1023.031   89.3245   58.2100   97.0313         6
3                NA:NA:NA:NA                          38.30484      22.5449  1067.850  142.0350   87.7653  100.9940         7
4                NA:NA:NA:NA                          41.84689      20.9681   914.129  108.8993   64.3630   87.7803         7
5                NA:NA:NA:NA                          61.43708      24.8080  1089.711  102.8500   72.4812   89.6421         7

Is there way to make this compatible for plotting with CATALYST (I really like the visualizations). I tried:

colData(sce)$cluster_id = sce@int_colData$cell_id

But this causes problems with functions that rely upon the cluster_codes field for plotting. I basically just want to visualize my cell annotation labels in the same way as done after running the cluster() function. I would also like to merge those number labels to their actual cell type names (e.g. CD4 T-cells) later since I have that information for each cell_id.

HelenaLC commented 5 months ago

This should do the trick:

# load up example data
data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)

# mock up some cluster IDs
kid <- factor(sample(c("la", "le", "lu"), ncol(sce), TRUE))

# setup 'cluster_codes'
tbl <- data.frame(som100=seq(ncol(sce)), annotation=kid)

# assign cell index as base 'cluster_id'
sce$cluster_id <- tbl$som100

# add codes to 'metadata'
metadata(sce)$cluster_codes <- tbl 

# spot-check
identical(cluster_ids(sce, "annotation"), kid)
## > TRUE

# exemplary plotting
sce <- runDR(sce, "UMAP", cells=100)
plotDR(sce, color_by="annotation")

image