PacificBiosciences / HiFi-human-WGS-WDL

BSD 3-Clause Clear License
52 stars 30 forks source link

Duplicate IDs in pbsv.phased.vcf.gz #146

Closed gevro closed 3 months ago

gevro commented 3 months ago

Hi, There are multiple different records (different variants on different chromosomes) with the same ID in the VCF ID column. Likely a bug.

williamrowell commented 3 months ago

This is an artifact of pbsv call parallelization. To reduce the runtime, pbsv structural variant calling is split into 14 roughly equal chunks. The IDs are only unique within each chunk.