Closed jackgoldsmith4 closed 7 years ago
It's used to get the reference base when a genomic interval 'breaks' when combining VCFs (or gVCFs). Example: File t0.vcf
chr1 100 END=500
File t1.vcf
chr1 100 END=150
chr1 600 END=700
The combined VCF records will look like:
chr1 100 END=150
chr1 151 END=500
chr1 600 END=700
Now, the reference base at position chr1:151
is unknown and can be obtained only from the reference genome. Such a scenario might occur for deletions as well (spanning deletions).
I'm surprised the program didn't fail - perhaps the test case doesn't hit the scenario described above.
This makes sense. However, Hail is not planning to use genomicsDB to combine VCFs. Thanks!
Hi, I am wondering how the .fasta reference genome file is used in genoimcsDB. It is a large file, and it is set as a mandatory parameter in the docs. However, my tests pass without this file, and all I get is a warning that it could not be opened. What is the purpose for this file in genomicsDB?