This fixes two issues recently discovered with LabelSorter and --Unspecified-- values in group-based annotations:
First, there are some legacy studies on production (SCP542 & SCP820) that have invalid data in the unique values array for some group-based annotations, specifically NaN. This was due to previous issues (here and here) in scp-ingest-pipeline where NaN was returned by pandas instead of blank strings (both studies predate these fixes). The new natural sorting LabelSorter class throws an error when trying to call downcase on these values, as seen in the UI.
Secondly, for studies that have --Unspecified-- labels, these are incorrectly sorted into the first position of the color map server-side, and then reassigned to the semi-transparent light grey color by the front end and shifted to the end of the legend. This means that the first available color for group-based annotations (red) is never used in this case.
Now, these NaN values are correctly converted to blank strings, and any --Unspecified-- entries are automatically moved to the end of the color map, making the red label color available again for the first entry.
MANUAL TESTING
The data used in the original PR to fix cell filtering for blank labels works very well in this case, if you have a local copy (sourced from SCP2407). However, any group-based annotation that has --Unspecified-- values will work.
In a Rails console session, load the celltype_subcluster annotation from your copy of SCP2407 (if using):
study = Study.find_by(accession: <your accession>)
meta = study.cell_metadata.by_name_and_type('celltype_subcluster', 'group')
(Optional) if not using SCP2407, find any metadatum locally that has a blank value in the unique values array that has many labels:
meta = CellMetadatum.where(:values.in => ['']).sort_by { |m| m.values.count }.last
Change the blank entry to NaN to reproduce the data integrity error:
BACKGROUND & CHANGES
This fixes two issues recently discovered with
LabelSorter
and--Unspecified--
values in group-based annotations:First, there are some legacy studies on production (SCP542 & SCP820) that have invalid data in the unique values array for some group-based annotations, specifically
NaN
. This was due to previous issues (here and here) inscp-ingest-pipeline
whereNaN
was returned bypandas
instead of blank strings (both studies predate these fixes). The new natural sortingLabelSorter
class throws an error when trying to calldowncase
on these values, as seen in the UI.Secondly, for studies that have
--Unspecified--
labels, these are incorrectly sorted into the first position of the color map server-side, and then reassigned to the semi-transparent light grey color by the front end and shifted to the end of the legend. This means that the first available color for group-based annotations (red) is never used in this case.Now, these
NaN
values are correctly converted to blank strings, and any--Unspecified--
entries are automatically moved to the end of the color map, making the red label color available again for the first entry.MANUAL TESTING
The data used in the original PR to fix cell filtering for blank labels works very well in this case, if you have a local copy (sourced from SCP2407). However, any group-based annotation that has
--Unspecified--
values will work.celltype_subcluster
annotation from your copy of SCP2407 (if using):NaN
to reproduce the data integrity error: