cmdcolin / jbrowse-plugin-gwas

3 stars 1 forks source link

input data format #5

Open zhilianghu opened 2 years ago

zhilianghu commented 2 years ago

I didn't find ... could you describe the format with a sample data for the gwas/manhattan plot? THANKS!

cmdcolin commented 2 years ago

this should be described more but bed tabix is one option

a sample config is here https://github.com/cmdcolin/jbrowse-plugin-gwas/blob/master/config.json#L49

in that example it is a bed file with named columns (a comment line at top of file with column names) and the config file designates the "score column" as the neg_log_p column. The code will plot anything with a score attribute though so a plain bed file without the column names would just use the normal BED score column (col 5)

zhilianghu commented 2 years ago

Thank you, much appreciated. It's good to know it's default to bed format and option to customize.

zhilianghu commented 2 years ago

I am sorry to bother you but I am having hard time to see what went wrong on my set up https://www.animalgenome.org/gtex/jbrowse2 on the track "GTEX permutations - spleen" to plot a column of p-values. Following the "About track" I see the "adapter.bedGzLocation" and "adapter.index.location" are read, file header shows up, a vertical scale showed up briefly on page load and then disappears (FYI I did bgzip and TBI index w/o any options, simply "bgzip my_bgzipFile.txt"; followed by "tabix my_bgzipFile.txt.gz").... will appreciate your time and any suggestions.

cmdcolin commented 2 years ago

for indexing, I think for your file it needs to be re-indexed I tried downloading and reindexing with bgzip and it said there were unsorted positions so I ran

gunzip Spleen.permutations.2rd.bed.txt.gz
sort -k1,1 -k2,2n Spleen.permutations.2rd.bed.txt > Spleen.permutations.2rd.bed.sorted.txt
tabix -0 -b 2 -s 1 -e 2 -f Spleen.permutations.2rd.bed.sorted.txt.gz

The non-default tabix command is used because your file only has one coordinate column instead of start and end, see "tabix --help" (and you may want to get an updated version of tabix to see the same messages I see)

cmdcolin commented 2 years ago

also the code currently expects -log10(p) values pre-supplied, while it looks like you may be supplying raw p-values. it might be nice to make it so that it can automatically transform p-values with the -log10 function inside the program but it doesn't do that currently so you might want to run that in e.g. something like

x=read.csv('Spleen.permutations.2rd.bed.sorted.txt.gz',sep='\t',header=T,fill=T)
x$newpvals=-log10(x$Permut_p)
write.table(x,'out.bed',quote=F,row.names=F,sep='\t')
cmdcolin commented 2 years ago

just reopening this for helpfulness. should be described better in readme

zhilianghu commented 2 years ago

Colin - Many THANKS! Your example codes are extremely helpful to explain what should be exactly going on. It's most appreciated.

cmdcolin commented 2 years ago

sure thing :) happy to help