Illumina / paragraph

Graph realignment tools for structural variants
Other
150 stars 28 forks source link

Stripping END tag in INFO field #44

Closed WEClarke closed 4 years ago

WEClarke commented 4 years ago

Observed behavior:

For non-symbolic alleles Paragraph seems to be stripping the END tag from the INFO field (see below). This isn't desired behavior as it can impact tools that rely on this tag (for example vcfToBedpe).

Expected behavior:

Preserve the original info fields as they were in the input and only append the GRMPY_ID tag.

Example

Manta chr1 66160 MantaDEL:8:0:0:0:1:0 TTATATATATATATATTATATATACTATATATTTATATATATTACATATTATATATATAATATATATTATATAATATATATTATATTATATAATATATAATATAAATATAATATAAATTATATTATATAATATATAATATAAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAAATATAATATATAAATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTATAATATATATTATATAAATATATTTATATATTATATAAA T 196 PASS END=66613;SVTYPE=DEL;SVLEN=-453;CIGAR=1M453D;CIPOS=0,9;HOMLEN=9;HOMSEQ=TATATATAT

Paragraph chr1 66160 MantaDEL:8:0:0:0:1:0 TTATATATATATATATTATATATACTATATATTTATATATATTACATATTATATATATAATATATATTATATAATATATATTATATTATATAATATATAATATAAATATAATATAAATTATATTATATAATATATAATATAAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAAATATAATATATAAATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTATAATATATATTATATAAATATATTTATATATTATATAAA T 196 PASS SVTYPE=DEL;SVLEN=-453;CIGAR=1M453D;CIPOS=0,9;HOMLEN=9;HOMSEQ=TATATATAT;GRMPY_ID=chr1.vcf@a66f377e14617d867835ed906c5d6b272b1c404e2263781380e6c6c1da4e9267:1 GT:DP:FT:AD:ADF:ADR:PL 0/0:54:PASS:119,0:70,0:49,0:0,167,781

RSherman15 commented 4 years ago

Pretty sure it's because it uses pysam under the hood, and this is an unfortunate pysam "feature": https://github.com/pysam-developers/pysam/issues/659

WEClarke commented 4 years ago

@RSherman15 - thanks for the fast reply, this is indeed unfortunate. I agree that this shouldn't be the default procedure and at the very least there should be a way to force it to keep the END tag.

traxexx commented 4 years ago

Yes Rachel is right. It is the default behavior of pysam. With the presence of full alleles it will omit the END tag. One way to keep the END tag is to use symbolic allele, for your case <DEL>.