czbiohub-sf / sc2-illumina-pipeline

Bioinformatics pipeline for SARS-CoV-2 sequencing at CZ Biohub
GNU Affero General Public License v3.0
25 stars 9 forks source link

vcf is misisng most INDELs #80

Closed danrlu closed 3 years ago

danrlu commented 3 years ago

Our vcf is created with samtools/bcftools mpileup. It has a parameter -L that sets a depth cutoff to disable INDEL calling (doc here but sadly was not included in doc for the version we use, prob got missed when mpileup moved from samtools to bcftools). Our data usually has much higher depth than the default cutoff, which means in most of our samples, INDEL calling is turned off and therefore don't show up in the vcf.

Consensus genome is unaffected, but its generation also uses samtools mpileup. So if to update samtools/bcftools version, testing the fix would need to cover both places.

Another outstanding issue in our vcf is here, don't see it very often but very curious.

See also #56

danrlu commented 3 years ago

fixed with cd37a25 and pull request