broadinstitute / single_cell_portal_core

Rails/Docker application for the Broad Institute's single cell RNA-seq data portal
https://singlecell.broadinstitute.org
BSD 3-Clause "New" or "Revised" License
62 stars 26 forks source link

Adding `raw_counts` extraction phase for AnnData files (SCP-5102) #2114

Closed bistline closed 1 month ago

bistline commented 2 months ago

BACKGROUND & CHANGES

Building on #2113 and 359, this update adds end-to-end integration for raw counts extraction of AnnData files. When a user uploads an AnnData file for parsing we will extract the list of cells corresponding to both the raw and processed matrices. This is part of an effort to enable exploratory differential expression for AnnData files. New AnnData uploads that specify they have raw counts information will now be marked as eligible for differential expression.

Note: there is not a backfill migration for existing AnnData files as we do not know whether they have raw counts data in the adata.raw slot. Study owner outreach will be required in order to identify potential raw counts files. Also, while these new AnnData studies will be eligible for differential expression, there is still work to be done in order to fully enable this. Once that has completed and we have identified any potential existing studies w/ raw counts data, we can run a backfill migration to add differential expression results.

MANUAL TESTING

  1. Boot all services and sign in
  2. Create a new study and select the AnnData UX
  3. Upload any valid AnnData file, such as compliant_pbmc3K.h5ad, making sure to specify "Yes" for "I have raw count data in the adata.raw slot"
  4. Wait for the main parsing jobs to complete (3-5 minutes, you do not need to wait for subsampling)
  5. Open a Rails console session and load the above study:
    study = Study.last # or load by accession if you used an existing study
    study_file = study.study_files.last
  6. Confirm that the study has both raw & processed data and that the study is now eligible for DE (your counts/annotations may be different if you used a different AnnData file):
    
    study.has_raw_counts_matrices?
    => true

study.has_visualization_matrices? => true

study.has_expression_data? => true

study.expression_matrix_cells(study_file, matrix_type: 'raw').count => 2638

study.expression_matrix_cells(study_file, matrix_type: 'processed').count => 2638

DifferentialExpressionService.study_eligible?(study) => true

DifferentialExpressionService.find_eligible_annotations(study) => [{:annotation_name=>"louvain", :annotation_scope=>"study"}]

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 69.83%. Comparing base (67388b2) to head (1556395). Report is 67 commits behind head on development.

Files Patch % Lines
app/models/ingest_job.rb 0.00% 7 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2114/graphs/tree.svg?width=650&height=150&src=pr&token=HMWE5BO2a4&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2114?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) ```diff @@ Coverage Diff @@ ## development #2114 +/- ## =============================================== - Coverage 69.85% 69.83% -0.03% =============================================== Files 324 325 +1 Lines 27317 27352 +35 Branches 2263 2270 +7 =============================================== + Hits 19083 19101 +18 - Misses 8109 8126 +17 Partials 125 125 ``` | [Files](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2114?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) | Coverage Δ | | |---|---|---| | [app/models/ann\_data\_ingest\_parameters.rb](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2114?src=pr&el=tree&filepath=app%2Fmodels%2Fann_data_ingest_parameters.rb&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL21vZGVscy9hbm5fZGF0YV9pbmdlc3RfcGFyYW1ldGVycy5yYg==) | `100.00% <ø> (ø)` | | | [app/models/ingest\_job.rb](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2114?src=pr&el=tree&filepath=app%2Fmodels%2Fingest_job.rb&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL21vZGVscy9pbmdlc3Rfam9iLnJi) | `40.50% <0.00%> (-0.15%)` | :arrow_down: | ... and [10 files with indirect coverage changes](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2114/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)
eweitz commented 2 months ago

Code looks good, manual tests passed! Beyond this and SCP-5764, is there more needed to enable DE for AnnData?

bistline commented 1 month ago

Code looks good, manual tests passed! Beyond this and SCP-5764, is there more needed to enable DE for AnnData?

We'll want a backfill migration to get in any legacy results, but other than that I can't think of anything.