Supplemental figure for how we chose cell type annotation methods

allyhawkins commented 9 months ago

As mentioned in #37, we may want to include a figure that shows some benchmarking of methods or any data on why we chose to use SingleR and CellAssign. It might also be good to include information on maybe why we chose the references that we did. I think this will be largely dependent on what we say in the text about these methods and if we feel we need something here. So I'm filing this issue to remind us that we may want to include this in the future, but I'm not entirely what the contents will be yet.

allyhawkins commented 8 months ago

Just noting that we are going to try and address this. After some discussion in https://github.com/AlexsLemonade/ScPCA-manuscript/pull/61, we would like to include a comparison between the references tested for SingleR.

allyhawkins commented 7 months ago

Okay, so for SingleR, I created a plot that shows the distribution of the delta median statistic for all cells with a given celldex reference. I grabbed three libraries for each diagnosis group, using libraries from 3 of our 4 diagnosis groups, and annotated them with SingleR with all 4 celldex references we had been using for comparisons. Then I made a plot that compares the distributions across all refs. This shows that BlueprintEncodeData had either the highest delta median or was equivalent to the delta median for other references. FigS6_celldex-ref-comparison

@jaclyn-taroni @jashapiro what do we think?

Also, I'm trying to think of a way to show a similar plot for CellAssign. Most of our benchmarking data came from comparing annotations with CellAssign to submitter annotation for just samples from the RMS project. With using different references based on tissue type, I feel like maybe we don't need anything more than the table in #75?

jaclyn-taroni commented 7 months ago

Part of CellAssign's appeal is that it will label cells as unassigned. Is there any way to show that?

I think this plot is fine conceptually, but it needs some tweaks:

The panels will need to be bigger or points more transparent to distinguish between high-quality and low-quality effectively.
The y-axis ticks need labels.
Rotate the reference labels to 45 degrees – Might help with panel sizing ☝🏻

allyhawkins commented 7 months ago

Part of CellAssign's appeal is that it will label cells as unassigned. Is there any way to show that?

When we did our initial exploration of CellAssign, we applied it to one of the B-ALL samples using a follicular lymphoma reference that they use on their tutorial. This has B cells, CD4 and CD8 T cells, T helper cells, and other. Most of the B-ALL cells were categorized as B-cells, so we took out B cells and repeated it and saw that those cells were now "other" or unassigned. So we could try and replicate something like that but using the Panglao markers, where we annotate B-ALL samples with and without a reference with B-cells? I just genuinely have no idea how this will look. Going through some of the B-ALL samples, they are getting annotated with a variety of immune cells or unassigned, not just B-cells, so I'm not sure how compelling this would be.

Alternatively, we do have all the comparisons of CellAssign assignments to submitter annotations for the RMS project. In looking at some of them, the heatmaps and UMAPs together show that a large portion of cells are getting unassigned (what we call "Unknown") and that those cells correspond to tumor cells. I like being able to mention both that cells will get categorized as unknown if they don't fit a reference and that this is probably due to the fact that they are tumor cells. What do we think about including an example of these? Below is an example from one of the RMS libraries.

(This is an older version of the report before we fixed the legends)

allyhawkins commented 7 months ago

Just noting that based on our conversation in pre-planning, we will include this as an example with the UMAP and the heatmap only comparing submitter to CellAssign.

AlexsLemonade / scpca-paper-figures

Supplemental figure for how we chose cell type annotation methods #41