Is your feature request related to a problem? Please describe.
Not sure if this is a relevant issue or not. But I thought I would bring it up as a discussion.
Context: In a GMS-BT meeting case chiefgull (run with 11.2.0, a re-analysis of masterflea, run with 9.0.1) it was seen that the number of PASS variants in the final SV-vcf uploaded to Scout was increased from 197 to 8404.
This triggered a question of why the numbers had increased so significantly, and I learned that 8032 of the unique variants in this re-analysis came from TIDDIT which was added to the WGS flow in version 10.0.0 ((https://github.com/Clinical-Genomics/BALSAMIC/pull/947) )
To see if this was just an outlier I checked a few other cases before and after addition of TIDDIT. Below is a table summarising the number of variants in the final SV vcf with filter PASS (column 1) and PASS + TIDDIT (column2), for a few cases in version 9.0.1, 10.0.5 and 11.2.0 (the current latest version).
In summary in a lot of cases TIDDIT seems to add a lot of SVs.
9.0.1 PASS → Tiddit (0)
PASS
PASS + TIDDIT
fleetearwig
616
0
betterbeagle
662
0
exactmole
1059
0
fairant
781
0
likedguinea
222
0
notedstork
1871
0
uphornet
137
0
10.0.5 PASS → Tiddit
firmraptor
16832
13883
frankmagpie
14916
13497
dearboa
16385
15499
jointmako
14847
14597
crackbaboon
14473
14242
quickgoat
15489
15098
novelbream
19669
15212
11.2.0 (clinical sv vcf) PASS → Tiddit
expertsatyr
25508
1410
amplewasp
31941
1474
ableant
7153
7011
topsdonkey
8106
7959
suiteddrake
10958
8292
hardyweevil
8101
6739
In the VCF there is a value per variant about how many files this variant was observed in, taken probably from the SVDB merge step. But this value is not available to filter in Scout, nor any other quality-based metric to decrease the number of variants to a manageable amount to interpret.
Describe the solution you'd like
Either more filtering of the SV variants before upload to Scout, or more options for manual filtration in Scout, in which case we need to identify good parameters to filter by.
I spoke to Jesper about TIDDIT and there were 2 large conclusions, with fairly simple implementations to probably significantly reduce the number of variants:
Apparently we are calling SVs on both the normal and the tumor, but we are not doing any filtering of presence of these SV variants in the normal sample, and in essence we are just adding the normal variants to the tumor when the point is to use the normal variants to filter the somatic.
For BNDs TIDDIT calls 2 variants for each mutation, sort of the forward and the reverse version of the variant. What this means is that we could choose one variant per mutation and probably remove a couple of thousand additional variants before upload to Scout.
Is your feature request related to a problem? Please describe.
Not sure if this is a relevant issue or not. But I thought I would bring it up as a discussion.
Context: In a GMS-BT meeting case chiefgull (run with 11.2.0, a re-analysis of masterflea, run with 9.0.1) it was seen that the number of PASS variants in the final SV-vcf uploaded to Scout was increased from 197 to 8404.
This triggered a question of why the numbers had increased so significantly, and I learned that 8032 of the unique variants in this re-analysis came from TIDDIT which was added to the WGS flow in version 10.0.0 ((https://github.com/Clinical-Genomics/BALSAMIC/pull/947) )
To see if this was just an outlier I checked a few other cases before and after addition of TIDDIT. Below is a table summarising the number of variants in the final SV vcf with filter PASS (column 1) and PASS + TIDDIT (column2), for a few cases in version 9.0.1, 10.0.5 and 11.2.0 (the current latest version).
In summary in a lot of cases TIDDIT seems to add a lot of SVs.
In the VCF there is a value per variant about how many files this variant was observed in, taken probably from the SVDB merge step. But this value is not available to filter in Scout, nor any other quality-based metric to decrease the number of variants to a manageable amount to interpret.
Describe the solution you'd like
Either more filtering of the SV variants before upload to Scout, or more options for manual filtration in Scout, in which case we need to identify good parameters to filter by.
SOMATICSCORE which we're planning to introduce to Scout (https://github.com/Clinical-Genomics/BALSAMIC/issues/1107) is only available for variants called with Manta, and would not enable us to filter TIDDIT variants.
Describe alternatives you've considered
Is TIDDIT necessary? Why was it introduced?
Additional context If possible, add any other context or screenshots about the feature request here.
Expected output for the feature If possible, an example of expected output
Current BALSAMIC version
balsamic --version
11.2.0