Closed yashcrux closed 1 year ago
Hi there, thanks for the bug report. This is not expected behaviour.
Are there only insertions in the sv_breakpoints.vcf
file as well? This VCF contains all the breakpoints without any filters applied, so should be able to inform whether the problem is happening at the filtering step or earlier.
I just searched all the sv_breakpoints.vcf files. and no other SNV's can be found
Interesting. I haven't seen this behaviour before. Which mapper was used to align your BAM files? Do you have an estimate of the sequencing depth of the samples?
Additionally, would you be able to share the command that you used to run SAVANA?
Thanks!
I am using minimap2 with the command minimap2 -t 8 -ax map-pb {input.reference} {input.fastq} > {output}
The sequencing depth is around 20 - 30 fold for both tumour and normal samples. Maybe some cases its lower but I am not sure
As for the SAVANA command : savana --tumour {input.Tumour_bam} --normal {input.Normal_bam} --outdir {output} --ref {input.ref}
Are you able to share an example subsetted BAM file?
Additionally would you be able to run the following on one of your sv_breakpoints.vcf
files to see how many insertions are being reported?
grep -v "^#" {sv_breakpoints.vcf} | awk '{print $8}' | awk -F';' '{print $1}' | sort | uniq -c
Hi
The BAM file is actually not possible to share because of its size
83G Jun 14 13:15 ../../../../BAMS/m54274Ue_220414_145454.final.bam 76G Jun 13 22:23 ../../../../BAMS/m54274Ue_221001_133159.final.bam
Meanwhile the output from the command for the same pair as previous
107 GT 226151 SVTYPE=BND 791861 SVTYPE=INS
From that output it looks like there are quite a few non-insertion variants. BND and INS are the only two types of variants in the SAVANA output VCF by design. Since it's not possible to definitively determine an SVTYPE without copy number information, we've elected to report non-insertion variants as type BND (see a more detailed explanation about this from the GRIDSS SV caller here: https://github.com/PapenfussLab/gridss#why-are-all-calls-bnd).
Instead, we report the breakend orientation using brackets in the ALT field as described in section 5.4 of VCFv4.2. We also report this in a "BP_NOTATION" field which can be roughly converted to different nomenclatures as follows:
Nomenclature | Deletion-like | Duplication-like | Head-to-head Inversion | Tail-to-tail Inversion |
---|---|---|---|---|
BP_NOTATION | +- | -+ | ++ | -- |
Brackets (VCF) | N[chr:pos[ / ]chr:pos]N | ]chr:pos]N / N[chr:pos[ | N]chr:pos] / N]chr:pos] | [chr:pos[N / [chr:pos[N |
5' to 3' | 3to5 | 5to3 | 3to3 | 5to5 |
Sorry for late reply but thanks for the detailed explanation of annotations that you have included in the tool.
No worries, thank you for raising this issue. I've added the above information to the main README to prevent future confusion.
Hello
I have recently used this tool to process pairs of samples, each pair has a germline and a tumor sample. I am doing a somatic calling for SVs. The sample data is produced by pacbio and is CLR
SAVANA provided the vcf files (with the default option) for each pair without any errors but I can see only the INSERTIONS being called. Other SNV's cannot be seen.
Is this expected?