cmdcolin / jbrowse-plugin-gwas

3 stars 1 forks source link

an idea for plot improvements #7

Open zhilianghu opened 2 years ago

zhilianghu commented 2 years ago

In utilizing JBrowse 2/GWAS plot for some GTEX plots, I have encountered a difficulty when rows of data exceeds several millions, which cause TIMEOUTs or browser choke. I was able to overcome this by avoiding subsequent rows with identical plot values previously seen (where "smears" is usually seen, which is less meaningful although still useful). This renders about the same plot pattern with more or less 1/5 of a million rows to plot. I wonder is it possible for the plot to reference a "weight score" column (score values reflect number of identical value repeats) for varying plot colors/sizes to use? (Translate degree of "overlay" to color/darkness.) Zhiliang

cmdcolin commented 2 years ago

if you have any sample data let me know. also curious what browser you are using (firefox or chrome, etc)

i agree making it faster would be great.

cmdcolin commented 2 years ago

I'm not exactly sure I know what your proposal is, if you want to elaborate (maybe a picture version?) let me know. I have seen some things where e.g. the many low scoring variants which are generally less meaningful don't get plotted with perfect fidelity and are downsampled, but then the GWAS peaks are plotted properly

zhilianghu commented 2 years ago

Here is a sample data set (https://www.animalgenome.org/hu/share/tmp/Hypothalamus.nominals.2rd.txt.gz). The columns:

  1. Gene ID.
  2. ID of the tested variant ("chr_position_A_B").
  3. Distance between the variant and the gene in bp.
  4. The nominal p-value of association (this is to plot).
  5. The slope associated with the nominal p-value I used firefox/chrome/safari/edge in my tests. My computer has 8GB of RAM. The browser choke was also for my colleagues. We all have fiber connections.

I humble GTEX data is more likely having overlaps than GWAS does.