MariaNattestad / Ribbon

A genome browser designed for complex structural variants and long reads.
https://genomeribbon.com
MIT License
264 stars 29 forks source link

Improve VCF parsing #120

Closed MariaNattestad closed 1 month ago

MariaNattestad commented 1 month ago

Update VCF parsing to work with the DRAGEN VCF in the examples: https://v2.genomeribbon.com/?session=example:giab_hg008_dragen#splitthreader

That means testing with this VCF: https://42basepairs.com/browse/s3/giab/data_somatic/HG008/Liss_lab/analysis/DRAGEN-v4.2.4_ILMN-WGS_20240312/standard/dragen_4.2.4_HG008-mosaic_tumor.sv.vcf.gz

This parses several major variant types with different signatures that I could find in that sample dataset.

It's difficult to fully cover all the same information that we used to get from previous SV call sets. Specifically, I didn't try to parse the number of split reads from the SR in the SAMPLE columns because there are both tumor and normal samples, so it gets a little complex and I don't know how consistent that will be for other VCFs.

Added tests that can be run through npm run test. Screenshot 2024-09-17 at 9 41 11 AM