PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

Questions about using virusbreakend #657

Closed joaohl216 closed 6 months ago

joaohl216 commented 7 months ago

Hello.

I am working on HPV analysis using human cancer whole genome sequencing data (depth 30X, aligned with hg38). And I'm using virusbreakend to detect HPV integration sites. (gridss v2.13.2)

All samples worked well without any problems. HPV was confirmed in Sample.virusbreakend.vcf.kraken2.report.viral.extracted.txt file of 37 samples. But HPV was not confirmed in the Sample.virusbreakend.vcf file of 34 samples.

Out of 37 samples, 7 integrations were confirmed in vcf in 3 samples. When checking the number of Virus Reads, HPV did not appear in the vcf file even in samples with more than 10,000 virus reads.

I am curious about the difference in the result files. Are there any conditions for a virus to be added to a VCF file?

I would be very grateful for any assistance!

d-cameron commented 7 months ago

The VCF file only contains the detected integration sites. If you have HPV infection in cells that haven't been clonally expanded (i.e. there are many different integration sites in many different cells) then they're unlikely to reach the calling threshold and won't be included in the VCF. For example, if the patient has an active HPV infection but HPV wasn't integrated into the progenitor cancer cell(s), then it would be unsurprising to find HPV viral sequences but no integration sites. Alternative explanations include the tool just missing the callable integration site(s), cross contamination, integration into difficult-to-call viral regions (e.g. low complexity/repetitive sequence), using a host reference that includes viral decoy sequence (e.g. the decoy sequence includes EBV; this unlikely explanation for your data given you've found integration sites in other samples), poor quality sequencing (e.g. read length/fragment size/duplication rate too high), and so on.

TDLR: .txt=viral presence, .vcf=clonal viral integration

On Tue, Apr 16, 2024 at 12:35 PM joaohl216 @.***> wrote:

Hello.

I am working on HPV analysis using human cancer whole genome sequencing data (depth 30X, aligned with hg38). And I'm using virusbreakend to detect HPV integration sites. (gridss v2.13.2)

All samples worked well without any problems. HPV was confirmed in Sample.virusbreakend.vcf.kraken2.report.viral.extracted.txt file of 37 samples. But HPV was not confirmed in the Sample.virusbreakend.vcf file of 34 samples.

Out of 37 samples, 7 integrations were confirmed in vcf in 3 samples. When checking the number of Virus Reads, HPV did not appear in the vcf file even in samples with more than 10,000 virus reads.

I am curious about the difference in the result files. Are there any conditions for a virus to be added to a VCF file?

I would be very grateful for any assistance!

— Reply to this email directly, view it on GitHub https://github.com/PapenfussLab/gridss/issues/657, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOBYOGBYSDV53V6YPKPTFTY5SE6BAVCNFSM6AAAAABGIODG3SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DIOJXGMZDANI . You are receiving this because you are subscribed to this thread.Message ID: @.***>