LincolnSteinLab / gdc-viewer

A JBrowse plugin for viewing GDC Data
https://lincolnsteinlab.github.io/gdc-viewer/
MIT License
3 stars 2 forks source link

Investigate VCF files from the GDC #80

Open agduncan94 opened 3 years ago

agduncan94 commented 3 years ago

Investigate whether VCF files downloaded from the GDC portal work with the vcf track type.

Note: VCF tracks require the VCF file to be gzipped and an index file to exist.

There are two parts to this. 1) Download a VCF file and generate the index file locally (may need to run bgzip). Using these two files, create a VCF track. Does this work? 2) Find a VCF file with an existing index. Set the URLtemplate path of a VCF track (and index path) to these locations (requires using the authentication token)

Use samtools tabix to create the index file.

Post your results to the investigation as a comment here.

agduncan94 commented 3 years ago

The GDC API has the concept of related files, which might be relevant for finding associated index files. https://docs.gdc.cancer.gov/API/Users_Guide/Downloading_Files/#related-files

agduncan94 commented 3 years ago

Part 1 - download vcf and generate index locally

I used the following configuration [tracks.vcf] urlTemplate=6abc7d24-74d1-4e62-975c-753aec620201.vep.vcf.gz storeClass=JBrowse/Store/SeqFeature/VCFTabix type=JBrowse/View/Track/HTMLVariants

This assumed the index file was named 6abc7d24-74d1-4e62-975c-753aec620201.vep.vcf.gz.tbi and in the same dir (jbrowse/data) as the vcf file.

GFJHogue commented 3 years ago

Relevant points from #91 investigation:

For the remaining part 2 I'll extend the work in #98 to load remote indexed&gzipped VCFs with a GDC token.