Open allyhawkins opened 2 months ago
Based on the findings from running AUCell
we are going to have to take some additional steps to get final tumor annotations (See https://github.com/AlexsLemonade/OpenScPCA-analysis/issues/567#issuecomment-2225778349 for some more context). I am updating this issue based on some next steps that we plan to take as of now:
Using the workflow added in #659, I have completed annotating all samples using SingleR
. I'm including here a general summary of the results. I've also included a zip file of all the reports that were generated here:
singler_reports_1.zip singler_reports_2.zip
Non-PDX libraries (822, 824, 825, 826, and 828):
AUCell
alone. PDX libraries (823, 1112, 1113, 1114, 1115, 1116):
827:
1111:
In looking at these reports, I actually think we have a good starting point for annotations and next steps should include refining the annotations obtained here (given we re-run the PDX samples). In thinking about refining these annotations, I think we want to start with clustering. We should obtain clusters we feel good about and then look at expression of the marker gene lists across those clusters. I would anticipate that tumor cells will cluster separately than normal cells and that normal cell clusters will show higher expression of the normal cell markers than tumor cell clusters and vice versa. Additionally, we want to be able to annotate tumor cell subpopulations which I think should be done by looking at clusters of tumor cells.
The only other thought I had was that we may be assigning more tumor cells here because the cells are more similar to other Ewing's tumor cells than any normal cell types in the HPCA/ blueprint references. It could be helpful to compare the annotations using only the tumor cells as reference to using a reference that contains both normal and tumor cells from Ewings samples and see if that shifts any of the tumor cell assignments. In particular, both 822 and 824 have clear groups of normal cells so we could create a reference that uses both of those samples as a reference rather than all of the tumor cells from all samples. This would mean cells could match either endothelial cells from HPCA or Blueprint or the endothelial cells in 822. I think this is worth doing, but probably after doing some refinement on those two samples by clustering and assigning cell types to all cells in that cluster.
Tagging @jashapiro in case you would like to see the current reports or have any thoughts. For now, I'm going to file these two last thoughts as issues for potential next steps.
If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.
292
Describe the goals of the changes to the analysis module.
Now that we have spent some time exploring methods for annotating tumor cells in the Ewing's samples and have done a lot of validation of tumor cells in two samples, SCPCS000490 and SCPCS000492, we would like to be able to identify tumor cells in the remaining samples. We plan to use what we learned from these samples and annotate the remaining tumor cells.
In general we will need to complete the following steps:
AUCell
for this with a pre-defined threshold that we determined withSCPCL000822
(see #532).Once this has been completed then we can move onto identifying the normal cell types that are present in these samples. For this I imagine running
SingleR
usingBlueprintEncodeData
on the normal cells and then evaluating that the expected markers for those cell type are present. We have not yet done any validation of specific normal cell types which is why I think this will be a separate question we need to answer.What will your pull request contain?
I plan on filing an issue to address each of the steps in the above analysis proposition. Each of these issues will correspond to one PR. This issue will be closed once all of those steps have been completed.
Will you require additional software beyond what is already in the analysis module?
No, we should have everything already set up.
Will you require different computational resources beyond what the analysis module already uses?
No.
If known, when do you expect to file the pull request?
No response