Closed AqsaAlam closed 5 months ago
How are you counting the 57 vs 92? A single integration site can have 2 breakpoints (left and right size of the insertion in human coordinate space) which is 4 VCF BND records.
Hi!
Thanks for your quick response!
The vcf summary output says "integrations = 57" (I've attached the file). The vcf file has 184 entries, 92 of which are human chromosome breakpoints. (I've also attached the file).
I've confirmed they are the same sample.
Cheers,
Aqsa Alam
From: Daniel Cameron @.> Sent: Wednesday, April 17, 2024 11:02 PM To: PapenfussLab/gridss @.> Cc: Aqsa Alam @.>; Author @.> Subject: Re: [PapenfussLab/gridss] Difference in number of integrations in virus summary vs. VCF (Issue #658)
How are you counting the 57 vs 92? A single integration site can have 2 breakpoints (left and right size of the insertion in human coordinate space) which is 4 VCF BND records.
- Reply to this email directly, view it on GitHubhttps://github.com/PapenfussLab/gridss/issues/658#issuecomment-2062906469, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXFDL4PMRAIU2OTCXLSMGODY54ZSRAVCNFSM6AAAAABGMB2GXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRSHEYDMNBWHE. You are receiving this because you authored the thread.Message ID: @.**@.>>
"##FILTER=<ID=ASSEMBLY_BIAS,Description=""Mismatch between number of directly supporting reads and reads supporting via assembly."">"
"##FILTER=<ID=ASSEMBLY_ONLY,Description=""Variant is supported only by assembly evidence."">"
"##FILTER=<ID=ASSEMBLY_TOO_FEW_READ,Description=""Not enough reads contribute to this assembly as specified by 'assembly.minReads'"">"
"##FILTER=<ID=ASSEMBLY_TOO_SHORT,Description=""This assembly is shorter than a read length"">"
"##FILTER=<ID=INSUFFICIENT_SUPPORT,Description=""Does not reach the required threshold quality for calling as specified by 'variantcalling.minScore'"">"
"##FILTER=<ID=LOW_MAPQ,Description=""Mapping location may be ambiguous."">"
"##FILTER=<ID=LOW_QUAL,Description=""Low quality call as specified by 'variantcalling.lowQuality'"">"
"##FILTER=<ID=NO_ASSEMBLY,Description=""No assembly supporting this variant could be found."">"
"##FILTER=<ID=NO_RP,Description=""Variant does not have any direct read pair support."">"
"##FILTER=<ID=NO_SR,Description=""Variant does not have any direct split read support."">"
"##FILTER=<ID=REF,Description=""Breakpoint corresponds to reference allele"">"
"##FILTER=<ID=SINGLE_ASSEMBLY,Description=""Only one side of the breakpoint could be assembled."">"
"##FILTER=<ID=SINGLE_SUPPORT,Description=""Supported by fewer than 'variantcalling.minReads' fragments"">"
"##FILTER=<ID=SMALL_EVENT,Description=""Event size is smaller than the minimum reportable size specified by 'variantcalling.minSize'"">"
"##FORMAT=<ID=AF,Number=A,Type=Float,Description=""Allele fraction"">"
"##FORMAT=<ID=ANRP,Number=1,Type=Integer,Description=""Count of read pairs at this breakend assembled into a contig that does not support the breakpoint."">"
"##FORMAT=<ID=ANRPQ,Number=1,Type=Float,Description=""Quality score of read pairs at this breakend assembled into a contig that does not support the breakpoint."">"
"##FORMAT=<ID=ANSR,Number=1,Type=Integer,Description=""Count of split reads at this breakend assembled into a contig that does not support the breakpoint."">"
"##FORMAT=<ID=ANSRQ,Number=1,Type=Float,Description=""Quality score of split reads at this breakend assembled into a contig that does not support the breakpoint."">"
"##FORMAT=<ID=ASQ,Number=1,Type=Float,Description=""Pro-rata quality score contribution of assemblies supporting breakpoint"">"
"##FORMAT=<ID=ASRP,Number=1,Type=Integer,Description=""Count of read pairs incorporated into any breakpoint assembly"">"
"##FORMAT=<ID=ASSR,Number=1,Type=Integer,Description=""Count of split, soft clipped or indel-containing reads incorporated into any breakpoint assemblies"">"
"##FORMAT=<ID=BAQ,Number=1,Type=Float,Description=""Pro-rata quality score contribution of assemblies supporting just local breakend"">"
"##FORMAT=<ID=BASRP,Number=1,Type=Integer,Description=""Count of read pairs incorporated into any breakend assembly"">"
"##FORMAT=<ID=BASSR,Number=1,Type=Integer,Description=""Count of split, soft clipped or indel-containing reads incorporated into any breakend assemblies"">"
"##FORMAT=<ID=BQ,Number=1,Type=Float,Description=""Quality score of breakend evidence after evidence reallocation"">"
"##FORMAT=<ID=BSC,Number=1,Type=Integer,Description=""Count of soft clips supporting just local breakend per category"">"
"##FORMAT=<ID=BSCQ,Number=1,Type=Float,Description=""Quality score of soft clips supporting just local breakend per category"">"
"##FORMAT=<ID=BUM,Number=1,Type=Integer,Description=""Count of read pairs (with one read unmapped) supporting just local breakend per category"">"
"##FORMAT=<ID=BUMQ,Number=1,Type=Float,Description=""Quality score of read pairs (with one read unmapped) supporting just local breakend per category"">"
"##FORMAT=<ID=BVF,Number=1,Type=Integer,Description=""Count of fragments providing breakend for the variant allele."">"
"##FORMAT=<ID=CASQ,Number=1,Type=Float,Description=""Pro-rata quality score of complex compound breakpoint assemblies supporting breakpoint from elsewhere"">"
"##FORMAT=<ID=GT,Number=1,Type=String,Description=""Genotype"">"
"##FORMAT=<ID=IC,Number=1,Type=Integer,Description=""Count of read indels supporting breakpoint per category"">"
"##FORMAT=<ID=IQ,Number=1,Type=Float,Description=""Quality score of read indels supporting breakpoint per category"">"
"##FORMAT=<ID=QUAL,Number=1,Type=Float,Description=""Quality score of breakend evidence after evidence reallocation"">"
"##FORMAT=<ID=RASQ,Number=1,Type=Float,Description=""Pro-rata quality score contribution of assemblies supporting breakpoint from remote breakend"">"
"##FORMAT=<ID=REF,Number=1,Type=Integer,Description=""Count of reads mapping across this breakend"">"
"##FORMAT=<ID=REFPAIR,Number=1,Type=Integer,Description=""Count of reference read pairs spanning this breakend supporting the reference allele"">"
"##FORMAT=<ID=RP,Number=1,Type=Integer,Description=""Count of read pairs supporting breakpoint per category"">"
"##FORMAT=<ID=RPQ,Number=1,Type=Float,Description=""Quality score of read pairs supporting breakpoint per category"">"
"##FORMAT=<ID=SR,Number=1,Type=Integer,Description=""Count of split reads supporting breakpoint per category"">"
"##FORMAT=<ID=SRQ,Number=1,Type=Float,Description=""Quality score of split reads supporting breakpoint per category"">"
"##FORMAT=<ID=VF,Number=1,Type=Integer,Description=""Count of fragments supporting the variant breakpoint allele and not the reference allele."">"
"##INFO=<ID=ANRP,Number=1,Type=Integer,Description=""Count of read pairs at this breakend assembled into a contig that does not support the breakpoint."">"
"##INFO=<ID=ANRPQ,Number=1,Type=Float,Description=""Quality score of read pairs at this breakend assembled into a contig that does not support the breakpoint."">"
"##INFO=<ID=ANSR,Number=1,Type=Integer,Description=""Count of split reads at this breakend assembled into a contig that does not support the breakpoint."">"
"##INFO=<ID=ANSRQ,Number=1,Type=Float,Description=""Quality score of split reads at this breakend assembled into a contig that does not support the breakpoint."">"
"##INFO=<ID=AS,Number=1,Type=Integer,Description=""Count of assemblies supporting breakpoint"">"
"##INFO=<ID=ASC,Number=1,Type=String,Description=""CIGAR encoding assembly contig anchoring alignments. Local assemblies are excluded due to https://github.com/PapenfussLab/gridss/issues/213."">"
"##INFO=<ID=ASQ,Number=1,Type=Float,Description=""Quality score of assemblies supporting breakpoint"">"
"##INFO=<ID=ASRP,Number=1,Type=Integer,Description=""Count of read pairs incorporated into any breakpoint assembly"">"
"##INFO=<ID=ASSR,Number=1,Type=Integer,Description=""Count of split, soft clipped or indel-containing reads incorporated into any breakpoint assemblies"">"
"##INFO=<ID=BA,Number=1,Type=Integer,Description=""Count of assemblies supporting just local breakend"">"
"##INFO=<ID=BAQ,Number=1,Type=Float,Description=""Quality score of assemblies supporting just local breakend"">"
"##INFO=<ID=BASRP,Number=1,Type=Integer,Description=""Count of read pairs incorporated into any breakend assembly"">"
"##INFO=<ID=BASSR,Number=1,Type=Integer,Description=""Count of split, soft clipped or indel-containing reads incorporated into any breakend assemblies"">"
"##INFO=<ID=BEALN,Number=.,Type=String,Description=""Potential alignment locations of breakend sequence in the format chr:start|strand|cigar|mapq. Depending on the alignment information available, strand and mapq may be empty."">"
"##INFO=<ID=BEID,Number=.,Type=String,Description=""Identifiers of assemblies supporting the variant."">"
"##INFO=<ID=BEIDH,Number=.,Type=Integer,Description=""Remote chimeric alignment offset of corresponding BEID assembly."">"
"##INFO=<ID=BEIDL,Number=.,Type=Integer,Description=""Local chimeric alignment offset of corresponding BEID assembly."">"
"##INFO=<ID=BENAMES,Number=.,Type=String,Description=""Read names of all reads providing direct breakend support."">"
"##INFO=<ID=BMQ,Number=1,Type=Float,Description=""Mean MAPQ of breakend supporting reads."">"
"##INFO=<ID=BMQN,Number=1,Type=Float,Description=""Minimum MAPQ of breakend supporting reads."">"
"##INFO=<ID=BMQX,Number=1,Type=Float,Description=""Maximum MAPQ of breakend supporting reads."">"
"##INFO=<ID=BPNAMES,Number=.,Type=String,Description=""Read names of all reads providing direct breakpoint support."">"
"##INFO=<ID=BQ,Number=1,Type=Float,Description=""Quality score of breakend evidence"">"
"##INFO=<ID=BSC,Number=1,Type=Integer,Description=""Count of soft clips supporting just local breakend"">"
"##INFO=<ID=BSCQ,Number=1,Type=Float,Description=""Quality score of soft clips supporting just local breakend"">"
"##INFO=<ID=BUM,Number=1,Type=Integer,Description=""Count of read pairs (with one read unmapped) supporting just local breakend"">"
"##INFO=<ID=BUMQ,Number=1,Type=Float,Description=""Quality score of read pairs (with one read unmapped) supporting just local breakend"">"
"##INFO=<ID=BVF,Number=1,Type=Integer,Description=""Count of fragments providing breakend for the variant allele."">"
"##INFO=<ID=CAS,Number=1,Type=Integer,Description=""Count of complex compound breakpoint assemblies supporting breakpoint from elsewhere"">"
"##INFO=<ID=CASQ,Number=1,Type=Float,Description=""Quality score of complex compound breakpoint assemblies supporting breakpoint from elsewhere"">"
"##INFO=<ID=CIPOS,Number=2,Type=Integer,Description=""Confidence interval around POS for imprecise variants"">"
"##INFO=<ID=CIRPOS,Number=2,Type=Integer,Description=""Confidence interval around remote breakend POS for imprecise variants"">"
"##INFO=<ID=CQ,Number=1,Type=Float,Description=""Breakpoint quality score before evidence reallocation"">"
"##INFO=<ID=EVENT,Number=1,Type=String,Description=""ID of event associated to breakend"">"
"##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description=""Length of base pair identical micro-homology at event breakpoints"">"
"##INFO=<ID=HOMSEQ,Number=.,Type=String,Description=""Sequence of base pair identical micro-homology at event breakpoints"">"
"##INFO=<ID=IC,Number=1,Type=Integer,Description=""Count of read indels supporting breakpoint"">"
"##INFO=<ID=IHOMPOS,Number=2,Type=Integer,Description=""Position of inexact homology"">"
"##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description=""Imprecise structural variation"">"
"##INFO=<ID=INSRM,Number=.,Type=String,Description=""RepeatMasker classification of inserted sequence in the format of RepeatType#RepeatClass|repeat position|repeat orientation|insertion sequence CIGAR|repeat masker alignment score|edit distance to canonical repeat sequence. Edit distance may be blank."">"
"##INFO=<ID=INSRMP,Number=1,Type=Float,Description=""Portion of inserted sequence whose alignment overlaps the repeatmasker repeat. 1.0 indicates the inserted sequence entirely mapping to the repeat."">"
"##INFO=<ID=INSRMRC,Number=1,Type=String,Description=""Inserted sequence repeatmasker repeat class."">"
"##INFO=<ID=INSRMRO,Number=1,Type=String,Description=""Inserted sequence repeatmasker repeat orientation."">"
"##INFO=<ID=INSRMRT,Number=1,Type=String,Description=""Inserted sequence repeatmasker repeat type."">"
"##INFO=<ID=INSTAXID,Number=1,Type=Integer,Description=""NCBI Taxonomy identifier for inserted sequence."">"
"##INFO=<ID=IQ,Number=1,Type=Float,Description=""Quality score of read indels supporting breakpoint"">"
"##INFO=<ID=MATEID,Number=.,Type=String,Description=""ID of mate breakends"">"
"##INFO=<ID=MQ,Number=1,Type=Float,Description=""Mean MAPQ of breakpoint supporting reads."">"
"##INFO=<ID=MQN,Number=1,Type=Float,Description=""Minimum MAPQ of breakpoint supporting reads."">"
"##INFO=<ID=MQX,Number=1,Type=Float,Description=""Maximum MAPQ of breakpoint supporting reads."">"
"##INFO=<ID=RAS,Number=1,Type=Integer,Description=""Count of assemblies supporting breakpoint from remote breakend"">"
"##INFO=<ID=RASQ,Number=1,Type=Float,Description=""Quality score of assemblies supporting breakpoint from remote breakend"">"
"##INFO=<ID=REF,Number=1,Type=Integer,Description=""Count of reads mapping across this breakend"">"
"##INFO=<ID=REFPAIR,Number=1,Type=Integer,Description=""Count of reference read pairs spanning this breakend supporting the reference allele"">"
"##INFO=<ID=RF,Number=1,Type=Integer,Description=""Reference fragments. Count of fragments supporting the reference allele and not the variant allele."">"
"##INFO=<ID=RP,Number=1,Type=Integer,Description=""Count of read pairs supporting breakpoint"">"
"##INFO=<ID=RPQ,Number=1,Type=Float,Description=""Quality score of read pairs supporting breakpoint"">"
"##INFO=<ID=RSI,Number=.,Type=Integer,Description=""Support interval offsets of partner breakend."">"
"##INFO=<ID=SB,Number=1,Type=Float,Description=""Strand bias of the reads supporting the variant. 1 indicates that reads would be aligned to the positive strand if the reference was changed to the variant allele. 0 indicates that reads bases would be aligned to the negative strand if the reference was changed to the variant allele. Strand bias is calculated purely from supporting reads and exclude read pair support since these are 100% strand bias. Note that reads both directly supporting the variant, and supporting via assembly will be double-counted. Both breakpoint and breakend supporting reads are included."">"
"##INFO=<ID=SC,Number=1,Type=String,Description=""CIGAR for displaying anchoring alignment of any contributing evidence and microhomologies. Local assemblies are excluded due to https://github.com/PapenfussLab/gridss/issues/213"">"
"##INFO=<ID=SELF,Number=0,Type=Flag,Description=""Indicates a breakpoint is self-intersecting"">"
"##INFO=<ID=SI,Number=.,Type=Integer,Description=""Support interval offsets from breakend position in which at least one supporting read/read pair/assembly is mapped."">"
"##INFO=<ID=SR,Number=1,Type=Integer,Description=""Count of split reads supporting breakpoint"">"
"##INFO=<ID=SRQ,Number=1,Type=Float,Description=""Quality score of split reads supporting breakpoint"">"
"##INFO=<ID=SVTYPE,Number=1,Type=String,Description=""Type of structural variant"">"
"##INFO=<ID=VF,Number=1,Type=Integer,Description=""Count of fragments supporting the variant breakpoint allele and not the reference allele."">"
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
"##contig=
Hi!
I am analyzing results from virusbreakend and noticed that the number of integrations in the summary output is different than that in the VCF. For example, the summary reported 57 integrations for my sample, but the VCF is showing 92 (only looking at human chromosomes). Why is that?