UCSC-MedBook / patient-care

Clincian facing portal showing pathways, signatures, and genes of interest
2 stars 1 forks source link

Store expression / variance filters on sample groups as gene sets instead of blobs #99

Open e-t-k opened 8 years ago

e-t-k commented 8 years ago

Currently, when a sample group has expression/variance filters applied, the filtered expression data is stored as a blob.

Instead, a gene set should be made containing the genes which were retained by the filters. (This gene set should be "anonymous" and not appear in the user's list of gene sets.) When the filtered sample group is downloaded or used as a background for outlier analysis etc, the expression data is to be retrieved & filtered on-the-fly by the gene set.

This may result in slower download speed. Benefits are ability to change sample names, large storage savings, and general adherence to principle of storing info in database rather than monolithic blobs.

Note that this will not affect outlier analysis performance (after the first one for a sample group) as this relies on separate blobs that will continue to be created and stored.

e-t-k commented 8 years ago

The current sample-name changer, as implemented, /will/ silently invalidate any expression-variance-filter blobs ; these blobs will continue to store the old sample names.

Why this is ok: -- for the current batch of renames, there are no significant sample groups that will be affected (just one of olena's groups, with 3 samples in it.)

--This will not affect their validity as background cohorts for outlier analysis --in a later patch, either we switch to gene sets in which case this becomes irrelevant; or we add the code to invalidate the blobs when it's not time-sensitive.