broadinstitute / single_cell_portal_core

Rails/Docker application for the Broad Institute's single cell RNA-seq data portal
https://singlecell.broadinstitute.org
BSD 3-Clause "New" or "Revised" License
62 stars 26 forks source link

Adding raw counts info to AnnData upload UX (SCP-5110) #2113

Closed bistline closed 2 months ago

bistline commented 2 months ago

BACKGROUND & CHANGES

This update adds in raw count information fields to the AnnData expression upload UX. This is required work for eventual extraction of raw counts cell names (and downstream enabling differential expression calculations using AnnData files). There is no extraction job being run now - this only persists form data in the database. The values are stored both in the AnnDataFileInfo#data_fragments and ExpressionFileInfo documents, and updates propagate to both when saving. This also includes a backfill migration to ensure that existing data is correct (we had not been updating the ExpressionFileInfo document previously).

Short demo of new forms:

https://github.com/user-attachments/assets/93738d53-c07b-4818-8da0-f6c8df699ee4

MANUAL TESTING

  1. Boot as normal and sign in
  2. Load the upload wizard for any study that already has an AnnData file ingested
  3. In the Expression tab, select "Yes" for "I have raw count data in the adata.raw slot" and specify units
  4. Click "Save" in the form
  5. In a Rails console session, validate that the data has been persisted correctly (your values may be slightly different but should match what you see in the forms):
    
    study = Study.find(<id of study from URL bar>)
    study_file = study.study_files.last
    study_file.expression_file_info.attributes
    => 
    {"_id"=>BSON::ObjectId('66b12aca94ec8f1dd7897d23'),                                
    "biosample_input_type"=>"Whole cell",                                             
    "modality"=>"Transcriptomic: unbiased",                                           
    "raw_counts_associations"=>[],                                                    
    "library_preparation_protocol"=>"10x 5' v3",                                      
    "is_raw_counts"=>true,                                                            
    "units"=>"raw counts"}     

study_file.ann_data_file_info.find_fragment(data_type: :expression) => {"_id"=>"66b12ab7debe3074889dc6e4",
"taxon_id"=>"6033f530e241391884633745",
"expression_file_info"=> {"library_preparation_protocol"=>"10x 5' v3", "biosample_input_type"=>"Whole cell", "modality"=>"Transcriptomic: unbiased", "is_raw_counts"=>true, "units"=>"raw counts"}, "data_type"=>"expression"}

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 76.19048% with 10 lines in your changes missing coverage. Please review.

Project coverage is 69.82%. Comparing base (4e68f2f) to head (47c65fb). Report is 32 commits behind head on development.

Files Patch % Lines
...avascript/components/upload/ExpressionFileForm.jsx 58.33% 5 Missing :warning:
app/models/study.rb 37.50% 5 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113/graphs/tree.svg?width=650&height=150&src=pr&token=HMWE5BO2a4&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) ```diff @@ Coverage Diff @@ ## development #2113 +/- ## =============================================== + Coverage 69.80% 69.82% +0.02% =============================================== Files 324 325 +1 Lines 27286 27350 +64 Branches 2259 2270 +11 =============================================== + Hits 19047 19098 +51 - Misses 8114 8127 +13 Partials 125 125 ``` | [Files](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) | Coverage Δ | | |---|---|---| | [...script/components/upload/AnnDataExpressionStep.jsx](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113?src=pr&el=tree&filepath=app%2Fjavascript%2Fcomponents%2Fupload%2FAnnDataExpressionStep.jsx&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2phdmFzY3JpcHQvY29tcG9uZW50cy91cGxvYWQvQW5uRGF0YUV4cHJlc3Npb25TdGVwLmpzeA==) | `100.00% <100.00%> (ø)` | | | [app/javascript/components/upload/RawCountsStep.jsx](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113?src=pr&el=tree&filepath=app%2Fjavascript%2Fcomponents%2Fupload%2FRawCountsStep.jsx&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2phdmFzY3JpcHQvY29tcG9uZW50cy91cGxvYWQvUmF3Q291bnRzU3RlcC5qc3g=) | `96.29% <ø> (-0.14%)` | :arrow_down: | | [app/models/ann\_data\_file\_info.rb](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113?src=pr&el=tree&filepath=app%2Fmodels%2Fann_data_file_info.rb&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL21vZGVscy9hbm5fZGF0YV9maWxlX2luZm8ucmI=) | `97.39% <100.00%> (+0.48%)` | :arrow_up: | | [...avascript/components/upload/ExpressionFileForm.jsx](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113?src=pr&el=tree&filepath=app%2Fjavascript%2Fcomponents%2Fupload%2FExpressionFileForm.jsx&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2phdmFzY3JpcHQvY29tcG9uZW50cy91cGxvYWQvRXhwcmVzc2lvbkZpbGVGb3JtLmpzeA==) | `88.23% <58.33%> (-9.39%)` | :arrow_down: | | [app/models/study.rb](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113?src=pr&el=tree&filepath=app%2Fmodels%2Fstudy.rb&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL21vZGVscy9zdHVkeS5yYg==) | `81.45% <37.50%> (-0.26%)` | :arrow_down: | ... and [4 files with indirect coverage changes](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2113/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)
bistline commented 2 months ago

Great stuff! I'm glad we'll soon enable our differential expression UI for this increasingly common type of single cell study.

Code looks good, and manual tests passed. I suggest two trivial maintainability refinements, no blockers. Thanks for chatting to help me understand why AnnData raw counts are optional for the moment.

FWIW, because I had deleted an earlier AnnData file in my local test study, the current manual test instructions returned a study file record with file_type: "DELETE" and a false-negative error. Changing study.study_files.first to study.study_files.last resolved that in my case.

Good call - I also made the same discovery today and started likewise updated my testing behavior. Updating the testing instructions now.