Closed hurleyLi closed 2 years ago
Hi Hurley,
I'm sorry you're facing these issues, CellO should work for other tissue types, not just PBMC (see the original publication, which includes analyses on other tissue types: https://doi.org/10.1016/j.isci.2020.101913 ).
I notice that the data in GEO for the study that you mention consists of raw counts. Given that you are normalizing the data using the tutorial, I assume you are NOT normalizing the data into units of log(TPM+1)? For bulk RNA-seq assays, in which reads are generated from nearly the full length of the transcript, CellO requires expression in units of log(TPM+1). The normalization procedure in the tutorial works only for 3' assays such as 10x single cell data in which log(CPM+1) is equivalent to log(TPM+1).
Best, Matt
One more quick note on that dataset, which consists of fine-grained T cell subtypes. We found CellO is less accurate on many T cell subtypes (see Figure 6 from the paper), though I would expect CellO to label these at least as T cells (which it usually annotates very accurately). We are currently looking for more data to include in CellO's training set to increase the accuracy on these subtypes.
Hi Hurley,
One last note, I ran CellO on this dataset after normalizing via the tutorial (which as I mentioned, is technically not correct for bulk RNA-seq samples), but CellO did classify all of the samples correctly as T cells and correctly classified the Naive T cell subtypes:
This leads me to believe that the normalization is not the main issue. If you would like, you can send me the code you are using and I can see if I can spot any problems!
Best, Matt
Hi, I have a dataset from GSE123814 and I'm trying to re-analyze them using CellO. I normalize the data using the typical approach in your tutorial, and finish
cello.scanpy_cello()
without error. However, all the ~30 clustered were predicted as oxygen accumulating cell (CL:0000329). I also tried several other datasets, and it seems that CellO only works for PBMC data, but not other tissue types. Could you please comment on why this might happen and how to adjust the training for other tissue types? Thanks, Hurley