Closed WEClarke closed 4 years ago
Pretty sure it's because it uses pysam under the hood, and this is an unfortunate pysam "feature": https://github.com/pysam-developers/pysam/issues/659
@RSherman15 - thanks for the fast reply, this is indeed unfortunate. I agree that this shouldn't be the default procedure and at the very least there should be a way to force it to keep the END tag.
Yes Rachel is right. It is the default behavior of pysam. With the presence of full alleles it will omit the END tag. One way to keep the END tag is to use symbolic allele, for your case <DEL>
.
Observed behavior:
For non-symbolic alleles Paragraph seems to be stripping the END tag from the INFO field (see below). This isn't desired behavior as it can impact tools that rely on this tag (for example vcfToBedpe).
Expected behavior:
Preserve the original info fields as they were in the input and only append the GRMPY_ID tag.
Example
Manta
chr1 66160 MantaDEL:8:0:0:0:1:0 TTATATATATATATATTATATATACTATATATTTATATATATTACATATTATATATATAATATATATTATATAATATATATTATATTATATAATATATAATATAAATATAATATAAATTATATTATATAATATATAATATAAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAAATATAATATATAAATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTATAATATATATTATATAAATATATTTATATATTATATAAA T 196 PASS END=66613;SVTYPE=DEL;SVLEN=-453;CIGAR=1M453D;CIPOS=0,9;HOMLEN=9;HOMSEQ=TATATATAT
Paragraph
chr1 66160 MantaDEL:8:0:0:0:1:0 TTATATATATATATATTATATATACTATATATTTATATATATTACATATTATATATATAATATATATTATATAATATATATTATATTATATAATATATAATATAAATATAATATAAATTATATTATATAATATATAATATAAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAAATATAATATATAAATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTATAATATATATTATATAAATATATTTATATATTATATAAA T 196 PASS SVTYPE=DEL;SVLEN=-453;CIGAR=1M453D;CIPOS=0,9;HOMLEN=9;HOMSEQ=TATATATAT;GRMPY_ID=chr1.vcf@a66f377e14617d867835ed906c5d6b272b1c404e2263781380e6c6c1da4e9267:1 GT:DP:FT:AD:ADF:ADR:PL 0/0:54:PASS:119,0:70,0:49,0:0,167,781