AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
1 stars 8 forks source link

Identify a doublet-annotated cancer dataset #425

Open sjspielman opened 1 month ago

sjspielman commented 1 month ago

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.


Describe the goals of the changes to the analysis module.

As discussed in this discussion comment, previous ground-truth benchmarking for doublet detection methods has not really focused on cancer datasets. The goal if this issue is to try and find one or more cancer datasets with validated doublets that we might use for this purpose, but worth noting that such data may not exist. If it does exist, a forthcoming issue will be written to track including that in benchmarking.

What will your pull request contain?

No pull request is expected for this issue. Instead, this just tracks identifying datasets we might use in benchmarking.

Will you require additional software beyond what is already in the analysis module?


Will you require different computational resources beyond what the analysis module already uses?


If known, when do you expect to file the pull request?


sjspielman commented 1 month ago

I have done some literature searching here, and come up short. The only "validated" datasets I've been able to track down are all the same ones as in this benchmarking paper. The one dataset there that does contain cancer is a PDX sample, where cancer cells are human and immune are mouse and the classification therefore focused on the interspecies distinction, not within cancer cells only.

I'm going to keep this issue open since it's definitely a "nice to have" still, even though I'm not sure we'll be able to actually have it and it's not worth holding up further analysis for.