AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
1 stars 8 forks source link

Identify a method and write a script to annotate tumor cells in SCPCP000015 #564

Open allyhawkins opened 3 days ago

allyhawkins commented 3 days ago

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

292

Step 1 to address #563

Describe the goals of the changes to the analysis module.

Here we will identify a method that we feel comfortable with using to annotate tumor cells in all samples in SCPCP000015. Then we will write a script that runs that method on a single SCE object and outputs annotations.

Based on findings in #532 and #558 I think we have two options:

  1. Use AUCell with the pre-defined threshold calculated for SCPCL000822 and the marker gene list as the gene set.
  2. Use SingleR with the combination of the SCPCL000822 reference and BlueprintEncodeData reference.
  3. Use both of these methods and find the consensus between them. Any cells labeled as tumor in one and not the other would be labeled as ambiguous.

I personally am leaning towards using AUCell, because it utilizes a pre-defined set of tumor marker genes that we expect to be present in all tumor cells. SingleR uses tumor cells from one sample to define tumor cells in the rest of the samples which is probably fine for the most part, but given the heterogeneity of Ewing sarcoma, I am nervous we might miss some tumor cells. Although we are using a threshold from a single sample to define the auc cutoff, so both methods are somewhat biased.

What will your pull request contain?

A script to run the identified method on a single SCE object. This will ultimately be part of a workflow that runs the method on all samples in SCPCP000015.

Will you require additional software beyond what is already in the analysis module?

No

Will you require different computational resources beyond what the analysis module already uses?

No

If known, when do you expect to file the pull request?

No response

allyhawkins commented 3 days ago

One additional thought I have here is that we can start by just using AUCell, run it on all samples and then evaluate if we need to make any changes. Then we can decide if we need to also use SingleR.