eldariont / svim

Structural Variant Identification Method using Long Reads
GNU General Public License v3.0
155 stars 19 forks source link

Underflow on variant position? #38

Closed wdecoster closed 4 years ago

wdecoster commented 4 years ago

Hi,

I'm using SVIM v1.4.1, and I notice that for some of the random contigs and unplaced contigs (e.g. chr5_GL000208v1_random, chrUn_GL000226v1) I get surprisingly high coordinates, suspiciously always 4294967296 or 2^32. Building a normal tbi tabix index breaks for variants like that.

Below are some examples, I can share the full VCF for this sample if you want:

chr5_GL000208v1_random  4294967296      svim.BND.130629 N       ]chr5:47079731]N        7       PASS    SVTYPE=BND;SUPPORT=6;STD_POS1=.;STD_POS2=1.63   GT:DP:AD        ./.:.:.,.
chr17_KI270729v1_random 4294967296      svim.BND.311456 N       [chrX:66676029[N        1       PASS    SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.      GT:DP:AD        ./.:.:.,.
chr22_KI270736v1_random 4294967296      svim.BND.346279 N       ]chr22_KI270736v1_random:111780]N       1       PASS    SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.      GT:DP:AD        ./.:.:.,.
chrEBV  4294967296      svim.DUP_TANDEM.6691    N       <DUP:TANDEM>    1       not_fully_covered       SVTYPE=DUP:TANDEM;END=51803;SVLEN=51803;SUPPORT=1;STD_SPAN=.;STD_POS=.  GT:CN:DP:AD     ./.:2:.:.,.
chrUn_GL000226v1        4294967296      svim.DUP_TANDEM.6833    N       <DUP:TANDEM>    1       PASS    SVTYPE=DUP:TANDEM;END=15008;SVLEN=15008;SUPPORT=1;STD_SPAN=.;STD_POS=.  GT:CN:DP:AD     ./.:2:.:.,.
chrUn_KI270435v1        4294967296      svim.BND.347861 N       [chrY:10657300[N        1       PASS    SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.      GT:DP:AD        ./.:.:.,.
chrUn_KI270435v1        4294967296      svim.BND.347860 N       ]chr16:34065991]N       2       PASS    SVTYPE=BND;SUPPORT=2;STD_POS1=.;STD_POS2=.      GT:DP:AD        ./.:.:.,.
chrUn_KI270590v1        4294967296      svim.DUP_TANDEM.6993    N       <DUP:TANDEM>    4       PASS    SVTYPE=DUP:TANDEM;END=2914;SVLEN=2914;SUPPORT=4;STD_SPAN=3.95;STD_POS=1.65      GT:CN:DP:AD     ./.:2:.:.,.

I checked the length of chr5_GL000208v1_random and that's only 92kb. So something is off here :)

Cheers, Wouter

eldariont commented 4 years ago

Hi Wouter,

thanks for reporting this issue. I have observed a similar issue when SVIM erroneously outputs a VCF record with POS=0 (although the VCF spec require POS to be greater than 0). For some reason, bcftools replaces these wrong POS fields with values of 2^32. I have already fixed the underlying issue causing POS=0 in the output VCF with the following commit: 3c8915a8d731df2370fbfbd5242c02576fbb118e.

Can you confirm that the original VCF output from SVIM contains POS fields with 0 instead of 2^32? If this is the case, could you please reprocess your sample with the current master of SVIM instead of v1.4.1? If this fixes the issue, I can upload the current master as v1.4.2 to pypi and bioconda.

Cheers David

wdecoster commented 4 years ago

Hi David,

I can confirm this happens with bcftools sort but not with bcftools view:

diff -y --suppress-common-lines <(cat variants.vcf) <(bcftools view variants.vcf) | grep 4294967296 # returns nothing
diff -y --suppress-common-lines <(cat variants.vcf) <(bcftools sort variants.vcf) | grep 4294967296
Writing to /tmp/bcftools-sort.IChFRc
Merging 1 temporary files
Cleaning
Done
chr5_GL000208v1_random  0   svim.BND.130634 N   ]chr5 | chr5_GL000208v1_random  4294967296  svim.BND.130634 N
chr17_KI270729v1_random 0   svim.BND.311465 N   [chrX | chr17_KI270729v1_random 4294967296  svim.BND.311465 N
chr22_KI270736v1_random 0   svim.BND.346285 N   ]chr2 | chr22_KI270736v1_random 4294967296  svim.BND.346285 N
chrEBV  0   svim.DUP_TANDEM.6676    N   <DUP:TANDEM>  | chrEBV  4294967296  svim.DUP_TANDEM.6676    N   <DUP:
chrUn_GL000226v1    0   svim.DUP_TANDEM.6818    N     | chrUn_GL000226v1    4294967296  svim.DUP_TANDEM.6818
chrUn_KI270435v1    0   svim.BND.347865 N   ]chr1 | chrUn_KI270435v1    4294967296  svim.BND.347866 N
chrUn_KI270435v1    0   svim.BND.347866 N   [chrY | chrUn_KI270435v1    4294967296  svim.BND.347865 N
chrUn_KI270590v1    0   svim.DUP_TANDEM.6971    N     | chrUn_KI270590v1    4294967296  svim.DUP_TANDEM.6971

I'll install it from git and report back to you.

Cheers, Wouter

wdecoster commented 4 years ago

The version on GitHub seems to be okay!

eldariont commented 4 years ago

Thanks a lot, Wouter, for checking and sorry for the hassle. I just released SVIM v1.4.2 so this bug should be fixed also on bioconda soon.

Cheers, David

wdecoster commented 4 years ago

I've merged the changes, thanks again! https://github.com/bioconda/bioconda-recipes/pull/24743