Closed keyvanelhami closed 10 months ago
@ashwini06 The TMB calc in hydra looks very messy. Do you have any info regarding how it differs compared to our calculation?
What I can see from rule tmb_calculation:
we are performing (@ashwini06 correct me if I am wrong):
awk
and put it in variable region_size
bcftools annotate
snps,indels
only with bcftools view --types
bcftools filter
filter_vep --filter
:
awk
by diving number of variants in the filtered vcf (number of fields = $NF) with region_size
Re: from talking to Keyvan on Slack
I'd recommend against using fliter_vep unless you know exactly what you'd like to filter and run time is not an issue:
there are two cosmic annotations: one done by balsamic and one by Vep. Filter_vep always takes prio from vep.l, cause vep is in csq.
existing variation is extremely vague. It will also any variant that has a rsID or dbsnp membership.
filter_vep is very slow! And does not support multithreading. Cyvcf2 or equivalent can be faster.
there are also two gnomad annotations in balsamic. Lots of pipes are needed to achieve a multi-layered filtering to take the correct value of the two.
Although hydragenetics's script is messy, but at least it has full control over it by extracting exact information that is needed. I'd suggest to use hydragenetics code base or similar one that is well tested validated.
As previously discussed on cancer meeting, I will place an order for the Seracare TMB samples that we already have in the lab: gDNA TMB Mix Scores 7, 9, 13, 20, 26. Available with more info here: https://atlas.scilifelab.se/production/lab/sample_handling/reference_samples/
What app tag should we use?
@annagellerbring WGS for these will be expensive, so let's limit it a little bit. WGSPCFS120 for samples TMB 7, 13, and 26.
PANKTTR040 for all five samples. Baitset: GMCKsolid
As previously discussed on cancer meeting, I will place an order for the Seracare TMB samples that we already have in the lab: gDNA TMB Mix Scores 7, 9, 13, 20, 26. Available with more info here: https://atlas.scilifelab.se/production/lab/sample_handling/reference_samples/
What app tag should we use?
@keyvanelhami what delivery should I select? "Analysis"?
@annagellerbring Yes I think analysis will be good
@ashwini06 and @hassanfa, Can you guys provide some feedbacks regarding how easy/hard it's to implement hydragenetics's TMB script to Balsamic?
Hydragenetic's developers are more suitable to answer that question. I have not used it beyond just checking the code base. I have my own scripts to handle/calculate TMB.
@ashwini06 and @hassanfa, Can you guys provide some feedbacks regarding how easy/hard it's to implement hydragenetics's TMB script to Balsamic?
@keyvanelhami It is a python script. If we wanted to implement it in BALSAMIC, I think It is good to run that python script solely to understand the complexity and its behavior. The code looks like it requires some input files (artifact and background files). So maybe Jonas from Uppsala can help to get more details of that script.
On the other hand, aren't we sequencing the TMB validation samples soon? It is also good to run our existing TMB script to check how the calculation values look like in comparison to those samples' TMB scores.
@keyvanelhami Looking at the script below is the summary of TMB calculation:
A. We can generate the allele frequency and FFPE artifact databases, the rest is straight forward. B. We can fine tune the values for the thresholds according to the sequencing type.
Comment: The statistical part in the script needs correction.
Below are the threshold values: FFPE SNV observations = 1 DP = 200 VD = 10 AF = 0.05-0.45 gnomAD = 0.0001 db1000g = 0.0001 background sd = 5 Non-synonymous correction factor = 0.78 Non-synonymous and synonymous correction factor = 0.57
Now ticket #807109 is ready for analysis according to Henning.
amplewasp (TMB-7-WGS): Analysis completed expertsatyr (TMB-13-WGS): Analysis ongoing eagerroughy (TMB-26-WGS): In queue for sequencing
Method description from SeraCare for the TMB calculation.
amplewasp (TMB-7-WGS): Analysis completed expertsatyr (TMB-13-WGS): Analysis ongoing eagerroughy (TMB-26-WGS): In queue for sequencing
Method description from SeraCare for the TMB calculation.
All cases are sequenced and analyzed now according to HO.
Summary of TMB analysis in Balsamic. The TMB score from Balsamic is taken from file *tnscope.balsamic_stat
Case name | TMB score SeraCare | TMB score Balsamic |
---|---|---|
amplewasp (TMB-7-WGS) | 7 | 36 |
expertsatyr (TMB-13-WGS) | 13 | 33 |
eagerroughy (TMB-26-WGS) | 26 | 74 |
@khurrammaqbool will take lead on this Issue
Part of #1108
This issue is summarised in https://github.com/Clinical-Genomics/BALSAMIC/issues/1108, so closing it.
TMB calculation today in BALSAMIC
The current definition is based on https://github.com/Clinical-Genomics/BALSAMIC/issues/51 And the related snakemake rule https://github.com/Clinical-Genomics/BALSAMIC/blob/8244f837388404d236611349d9eac4cb094290d1/BALSAMIC/snakemake_rules/annotation/vep.rule#L60
How to improve the calculation
In attached article, a comparison between different labs with different lab and bioinfo methods and panel size has been made. The conclusion for improving TMB calculation, compared to WES data, is:
Suggested changes in current calculation for increasing TMB accuracy
To be discussed
TMB_article copy.pdf
Documentation
[ ] For Customers, Include the description of how TMB is calculated and related references in the balsamic readthedocs