AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
6 stars 16 forks source link

Cluster Ewing sarcoma samples - SCPCL000822 and SCPCL000824 #402

Closed allyhawkins closed 2 months ago

allyhawkins commented 5 months ago

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

292

Describe the goals of the changes to the analysis module.

Right now the Ewing sarcoma samples on the Portal contain clusters that were calculated using the default clustering in scpca-nf. These clusters were obtained using scran::clusterCells() with Jaccard Louvain clustering and a nearest neighbors parameter set to 20. These may not be the most optimal cluster assignments for these specific sample, so we should identify the best parameters for this project.

We should also think about if we want to pick parameters that work across all samples in the project or do this for each individual sample. We also will want to identify some metrics that can be used to evaluate the clustering.

What will your pull request contain?

This PR will contain a template notebook that evaluates clustering for a single sample. We should then run this template notebook across multiple samples to identify the best parameters.

Will you require additional software beyond what is already in the analysis module?

Any new R packages that are used will be added to the renv.lock file.

Will you require different computational resources beyond what the analysis module already uses?

No response

If known, when do you expect to file the pull request?

No response

allyhawkins commented 2 months ago

In looking at these reports, I actually think we have a good starting point for annotations and next steps should include refining the annotations obtained here (given we re-run the PDX samples). In thinking about refining these annotations, I think we want to start with clustering. We should obtain clusters we feel good about and then look at expression of the marker gene lists across those clusters. I would anticipate that tumor cells will cluster separately than normal cells and that normal cell clusters will show higher expression of the normal cell markers than tumor cell clusters and vice versa. Additionally, we want to be able to annotate tumor cell subpopulations which I think should be done by looking at clusters of tumor cells.

Based on this comment I made in https://github.com/AlexsLemonade/OpenScPCA-analysis/issues/563, I think it's time to try and assign clusters to these samples and address this issue. I'm going to update this to reflect assigning clusters for both SCPCL000822 and SCPCL000824 to start.

This should include a template notebook that evaluates clustering on a single sample and then we will want to run it on both samples. As part of this we should also output a TSV file that contains cluster assignments for a set of parameters. Then we can use the report to evaluate which parameters to use for our cluster assignments in downstream analysis.