IGS / gEAR

The gEAR Portal was created as a data archive and viewer for gene expression data including microarrays, bulk RNA-Seq, single-cell RNA-Seq and more.
https://umgear.org
GNU Affero General Public License v3.0
13 stars 4 forks source link

Trouble making heatmaps from Raible Lab Zebrafish Inner Ear Dataset #710

Closed gear-portal-team closed 4 months ago

gear-portal-team commented 4 months ago

From: Elizabeth Cebul

Email: elizabeth.cebul@nih.gov

Server IP: 10.142.0.16

Msg: Hello -

I am trying to use gEAR to make a supplementary figure showing the expression of the nrxn family in inner ear hair cells and supporting cells during development. I've had good luck with 3 of the 4 datasets I am trying to use - but I can't seem to make any heatmaps at all using the Raible Lab Zebrafish Inner Ear Dataset. I am going to the Multigene Display Viewer, selecting this dataset, heatmap, and the following genes: nrxn1a, nrxn2a, nrxn3a, nrxn3b. No matter what I do from there, I can't get a heatmap to load. This includes simply selecting HCSC as the primary category (and changing nothing else). This problem has persisted for several weeks. I am guessing that the complexity of the dataset may be an issue (there are SO MANY categories and so many groups within each category). But of course I could be wrong! Any help would be greatly appreciated.

Side note: I am actually using the heatmap to get log(2) or log(10) expression numbers and then making my own heatmaps using Prism - so I really just need numbers!

Thanks so much for any and all help, Elizabeth

Tags: ['']

Screenshot: None

adkinsrs commented 4 months ago

I did a couple runs watching the memory (quitting before it would kill the server). Steadily climbed with a non-clustering heatmap and with a dotplot (which never gives issues). In the API call, there is a combinatorial step to make composite indices to aid in filtering and groupbys and other downstream things, and my initial guess is the sheer number of columns combined with some big particular categories (such as the barcode-like "Cell" field) could be skyrocketing memory.

adkinsrs commented 4 months ago

@jorvis suggested that we should just throw out any categories that have more than 50 members, and that should help with the combinatorial load. I think we should also take it a step further and prevent those from being displayed on any UI filter/sort/color element. There's an off-chance we eliminate some useful categorical metadata, but it should help with these assays that people aren't going to filter or sort anyways.

EDIT: This requirement did not seem to fix the memory load. Need to do some "print statement" debugging

adkinsrs commented 4 months ago

This is the culprit function

https://github.com/IGS/gEAR/blob/c23f910506aed171f1e73fb353535a2a016bc4f6/lib/gear/mg_plotting.py#L1137

adkinsrs commented 4 months ago

So a quick solution for the end-user to do. If she does not intend on doing any metadata filtering, clear out any filter fields that are not being filtered (which clears out the obs_filters that is passed to the API call. This will generate the heatmap.

Now for a different solution which requires some work on my end. If I leave a filter completely full (like clicking the "All" button), the data is passed to obs_filters. This is a bit of a problem with this particular dataset, as every sample was filtered out under this condition. The reason is that a lot of a metadata had missing values (NaN) which were not supplied as options in the filter choices, so when going through each filter and applying it, all NaN-valued data was slowly being phased out. Since every sample had at least one NaN, no samples remained. So I need to ensure NaN is passed in the UI options if available (even if it is never used).

I elminated the function from the previous comment due to the "product" being computationally intensive, electing to just iterate through each filter key and filtering the data for just those values for that key. Occam's razor and all that

adkinsrs commented 4 months ago

Work on this is reflected in 210bf16 in ui-v2. In addition to eliminating the "product" function (which fixed the crashing issue), I made it so that missing values would explicitly be loaded to the UI as a filterable condition (labeled as NA). Previously missing values were not being returned as categories for an observation column

DanLesperance commented 4 months ago

Sent the user an email regarding how to get a heatmap generated. Waiting for reply to ensure the current fix is acceptable

adkinsrs commented 4 months ago

Going to close this in the meantime, since code is committed as a resolution