Open jiapeiyuan17 opened 9 months ago
Hi Jiapei,
Thanks for your interest in recount3!
For 1., the recount3 matrix market files are derived from the aggregate SJ.out.tab
files across the samples for a particular study (or tissue in the case of GTEx v8). I'll have to double check if we did any additional filtering (since it's been a while), but the contents should be the vast majority of what was SJ.out.tab files.
For 2. given that you want the splice junctions in a bed file of counts you're probably best off using Snaptron's re-formatted version of the GTEx v8 junctions in recount3:
https://snaptron.cs.jhu.edu/data/gtexv2/junctions.bgz
The header file is: https://snaptron.cs.jhu.edu/data/junctions.header.tsv
You'll also want to (minimally) download the GTEx samples description TSV: https://snaptron.cs.jhu.edu/data/gtexv2/samples.tsv
where the rail_id
column (first column) is the sample ID that appears in the comma delimited nested list (field samples
in the junctions file) for each junction to define which GTEx samples it appears in (has at least one read supporting). That field also contains the spliced read count of the junction for that sample, e.g.
Chris
Also, I should point out, the .bgz
file is a gzip-compatible block-gzip
format that can be read by gzip
or pigz
. But there's also the Tabix index file: https://snaptron.cs.jhu.edu/data/gtexv2/junctions.bgz.tbi which you can use to quickly query a genomic coordinate range of junctions as well.
Hi,
It sounds that thanks to Chris we can close this issue. Is that right Jiapei?
Best, Leo
Hi Ben and Kasper,
Now we are conducting a project utilizing data from GTEx project. We are particularly interested in the resource presented in recount3 and would like to seek clarification on two specific points:
Your prompt response to these inquiries would be greatly appreciated. Thank you for your attention to this matter.
Best, Jiapei