AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
1 stars 8 forks source link

Notebook comparing doublet results across methods #499

Open sjspielman opened 3 weeks ago

sjspielman commented 3 weeks ago

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

446

What is the goal of this pull request?

This PR explores the overlap for each method's doublet calls for each dataset, and also assesses performance of a "consensus caller."

Briefly describe the general approach you took to achieve this goal.

I wrote a single notebook to process all datasets with three main analysis sections, in addition to a conclusions section at the end:

  1. Upset plot comparing doublet calls
  2. PCA colored by consensus calls
  3. Confusion matrix and associated metric calculations

I also updated the overall module run script to render this notebook as the next step.

There are a few other changes here:

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yep.

Results

What is the name of your results bucket on S3?

researcher-654654257431-us-east-2

What types of results does your code produce (e.g., table, figure)?

There are no additional result files, only the rendered notebook which contains all results from this analysis. I directly committed this notebook to the directory where I saved it in the module. Is this ok, or should I export it to results?

What is your summary of the results?

Doublet calls are not much in agreement, so consensus calls are small sets. The consensus calls do not appear to be the most accurate, either.

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

renv environment needed to render this notebook

Are there particularly areas you'd like reviewers to have a close look at?

I suppose I could add more analysis or interpretation to the notebook, but as there really isn't "much of a there there" to these results, as it were, I wasn't sure what else might be useful and informative to include. Do you have any ideas?

Is there anything that you want to discuss further?

-

Author checklists

Check all those that apply. Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

sjspielman commented 1 week ago

We're back! I updated code throughout notebook in response to reviews, including using a 0.5 threshold for cxds and adding more PCAs. While looking at the PCAs, it actually looked to me like it was cxds that was capturing a lot of the consensus false negative droplets, and those points were missed by scDblFinder and scrublet. Therefore, for this first round returning back to you, I didn't do a re-analysis with just those two methods. Do you still think it's worth doing?

Edit: notebook for review convenience! 03_compare-benchmark-results.nb.html.zip

sjspielman commented 3 days ago

The next iteration has finally landed! I've incorporated the conceptual items brought up in review, and did some notebook rearrangement accordingly. Note that I do think this could be more modular since there is some repeated code between different types of consensus analyses (all 3 methods vs only 2 methods), but given where we anticipate this module headed overall, I wasn't sure that was really worth the effort.

Here's a rendered notebook: 03_compare-benchmark-results.nb.html.zip

sjspielman commented 3 days ago

One thought I had is that this module is an optional analysis module that can be used to run doublet detection using three different methods, but that's it. So contributors can have it if that's something they feel is necessary for their analysis, but we don't go beyond that.

This seems pretty reasonable to me actually, for the end-goal of the module to be a utility for folks to run these three methods on an SCE, which would include associated results and metadata.