broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

RNA-seq OUTRIDER volcano plot: outlier labels sometimes appear and sometimes disappear when refreshing the same page multiple times #3633

Closed bw2 closed 11 months ago

bw2 commented 11 months ago

When loading this page, the volcano plot sometimes looks like:

image

(which is correct), and other times the outlier label ("SNHG1") disappears:

image

The behavior toggles between showing and not showing the label after every 2 to 3 page reloads.

hanars commented 11 months ago

There appear to be 114 samples in seqr that somehow ended up with outlier data loaded twice, and so the behavior is undefined of which one it is showing so on each page load its arbitrarily picking which data to show. In the specific case you picked, there is data loaded Jaunary 30 that does not have that gene as significant, and then data from June 28 that does.

I think this issue has to do with the fact that we do not provide tissue types for outlier data but we do for all other RNA samples. If possible, I think the fix for this is to require the tissue type when loading outlier data, and then reloading. @bw2 if I update seqr to take a tissue type with OUTRIDER data, would you be able to provide it? Or would that be a problem?

bw2 commented 11 months ago

It would be straightforward to include a tissue column in the uploaded table. However, we will still need to reload this data multiple times - even within the same tissue type. OUTRIDER and FRASER detect outliers within batches of samples, so any time we change the batch (such as by adding new samples), the number of statistically significant outliers could change both in the new and the previously-loaded samples (as we are seeing in this family). In my tests, this doesn't happen much - the results are relatively stable - but still. Also, since the uploads sometimes fail, we may need to reload samples multiple times before the upload works.

hanars commented 11 months ago

we will still need to reload this data multiple times - even within the same tissue type

Yes, the issue is we had this logic that if data was loaded with no tissue and then a different data type was loaded with a tissue type we updated the tissue type for everything, and then when we later loaded outlier data with no tissue type the logic for replacing previously loaded data was not used because the tissue types mismatched so we treated it like a new tissue type that should exist in parallel instead of replacing previously loaded data. Thus, if we always provide a tissue type, our logic for knowing if this is a new tissue type or replacing a previously loaded tissue type will actually work, whereas now it does not

hanars commented 11 months ago

this is now fixed