broadinstitute / single_cell_portal_core

Rails/Docker application for the Broad Institute's single cell RNA-seq data portal
https://singlecell.broadinstitute.org
BSD 3-Clause "New" or "Revised" License
62 stars 26 forks source link

Handle nil values in LabelSorter, refine ordering (SCP-5714) #2086

Closed bistline closed 3 months ago

bistline commented 3 months ago

BACKGROUND & CHANGES

This fixes two issues recently discovered with LabelSorter and --Unspecified-- values in group-based annotations:

First, there are some legacy studies on production (SCP542 & SCP820) that have invalid data in the unique values array for some group-based annotations, specifically NaN. This was due to previous issues (here and here) in scp-ingest-pipeline where NaN was returned by pandas instead of blank strings (both studies predate these fixes). The new natural sorting LabelSorter class throws an error when trying to call downcase on these values, as seen in the UI.

Secondly, for studies that have --Unspecified-- labels, these are incorrectly sorted into the first position of the color map server-side, and then reassigned to the semi-transparent light grey color by the front end and shifted to the end of the legend. This means that the first available color for group-based annotations (red) is never used in this case.

Now, these NaN values are correctly converted to blank strings, and any --Unspecified-- entries are automatically moved to the end of the color map, making the red label color available again for the first entry.

MANUAL TESTING

The data used in the original PR to fix cell filtering for blank labels works very well in this case, if you have a local copy (sourced from SCP2407). However, any group-based annotation that has --Unspecified-- values will work.

  1. In a Rails console session, load the celltype_subcluster annotation from your copy of SCP2407 (if using):
    study = Study.find_by(accession: <your accession>)
    meta = study.cell_metadata.by_name_and_type('celltype_subcluster', 'group')
  2. (Optional) if not using SCP2407, find any metadatum locally that has a blank value in the unique values array that has many labels:
    meta = CellMetadatum.where(:values.in => ['']).sort_by { |m| m.values.count }.last
  3. Change the blank entry to NaN to reproduce the data integrity error:
    blank_idx = meta.values.index('')
    meta.values[blank_idx] = Float::NAN
    meta.save
  4. Boot as normal and load the same study/annotation that you just edited
  5. Confirm there is no error in the UI, and that the first label shown in the legend uses the color red
  6. Back in the Rails console, revert the values array:
    meta.values[blank_idx] = ''
    meta.save
codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 69.57%. Comparing base (33a461b) to head (fde11c5).

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2086/graphs/tree.svg?width=650&height=150&src=pr&token=HMWE5BO2a4&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2086?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) ```diff @@ Coverage Diff @@ ## development #2086 +/- ## =============================================== - Coverage 69.60% 69.57% -0.03% =============================================== Files 324 324 Lines 27242 27245 +3 Branches 2246 2246 =============================================== - Hits 18963 18957 -6 - Misses 8154 8163 +9 Partials 125 125 ``` | [Files](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2086?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) | Coverage Δ | | |---|---|---| | [lib/label\_sorter.rb](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2086?src=pr&el=tree&filepath=lib%2Flabel_sorter.rb&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-bGliL2xhYmVsX3NvcnRlci5yYg==) | `94.73% <100.00%> (+0.98%)` | :arrow_up: | ... and [3 files with indirect coverage changes](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2086/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)