Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
45 stars 16 forks source link

Tumor mutation burden #1108

Open pbiology opened 1 year ago

pbiology commented 1 year ago

Need

TMB calculation today in BALSAMIC

The current definition is based on:

  1. TMB was defined as the number of somatic, coding, base substitution, and indel mutations per megabase of genome examined. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0424-2 1.1. Non-coding alterations were not counted. 1.2 Alterations listed as known somatic alterations in COSMIC and truncations in tumor suppressor genes were not counted 1.3 Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not counted 1.4 Alterations that were recurrently predicted to be germline in our cohort of clinical specimens were not counted. 1.5 Known germline alterations in dbSNP were not counted. 1.6 To calculate the TMB per megabase, the total number of mutations counted is divided by the size of the coding region of the targeted territory. 1.7 select the table 1 from paper above as a ref for comparison.

  2. Tumor mutation burden (TMB), fraction of copy number–altered genome, and gene alterations were compared among patients with DCB and no durable benefit (NDB). http://ascopubs.org/doi/full/10.1200/JCO.2017.75.3384 2.1 in addition to above, copy number alterations were also counted.

Summary of TMB computation method in BALSAMIC:

  1. Region Size for WGS = 3101.78817
  2. Region Size for TGA = Sum(End - Start) / 1000000
  3. From somatic.*.research.vcf.gz:
    • Filter AF_TUMOR>=0.05
    • Remove "Existing_variation"
    • Remove "COSMIC"
    • Remove "non_coding_transcript_exon_variant"
    • Remove "non_coding_transcript_variant"
    • Remove "feature_truncation"
  4. TMB = Variants Count / Region Size

Summary of TMB computation method in Hydra pipeline:

https://github.com/hydra-genetics/biomarker/blob/develop/workflow/scripts/tmb.py

(May be updated)

  1. The variants are filtered against DP, VD, AF, gnomAD and db1000 with some thresholds.
  2. FFPE_SNV_artifacts is database of recurrent SNVs for the sequencing/sample type and filter can be set to minimum number of observations for a particular SNV to be recurrently observed artifacts.
  3. Background panel contains standard deviation and median scores for allele frequency to be used to filter out variants based on the threshold.
  4. Non-synonymous variants are corrected against the correction factor.
  5. Non-synonymous sites include missense, stop gained and stop loss variants.
  6. Total TMB is sum of all filtered synonymous and non-synonymous variants corrected against the correction factor

This may require: A. Generating the allele frequency and FFPE artefact databases. B. Fine-tuning the values for the thresholds according to the sequencing-type.

The thresholds are given below: FFPE SNV observations = 1 DP = 200 VD = 10 AF = 0.05-0.45 gnomAD = 0.0001 db1000g = 0.0001 background sd = 5 Non-synonymous correction factor = 0.78 Non-synonymous and synonymous correction factor = 0.57

Samples and Analysis

Seracare TMB samples: gDNA TMB Mix Scores 7, 9, 13, 20, 26.

TMB MEASUREMENTS

Method description from SeraCare for the TMB calculation.

TMB analysis from BALSAMIC

Summary of TMB analysis in Balsamic. The TMB score from Balsamic is taken from file *tnscope.balsamic_stat

WGS - 2023 (incorrect results /ALY)

Case name  TMB score SeraCare  TMB score BALSAMIC v? tumor-only (hg19) TMB score BALSAMIC v11 tumor-only (hg19)
amplewasp (TMB-7-WGS) 7 36 3.49057
expertsatyr (TMB-13-WGS) 13 33 1.6081
eagerroughy (TMB-26-WGS) 26 74 36.8758

TGA - 2023 (incorrect results /ALY)

Case name  TMB score SeraCare  TMB score BALSAMIC v? tumor-only (hg19) TMB score BALSAMIC v11 tumor-only (hg19)
fondpython (TMB-7-PAN) 7 - 25.8042
readyslug (TMB-9-PAN) 9 - 9.9698
cutepug (TMB-13-PAN) 13 - 24.0448
likedegret (TMB-20-PAN) 20 - 77.999
calmibex (TMB-26-PAN) 26 - 9.9698

New SeraSeq TMB reference samples

Seraseq gDNA TMB Mix Score 7 Seraseq gDNA TMB Mix Score 13 Seraseq gDNA TMB Mix Score 26

New reference samples were purchased in 2024 and RC information, sample calculations and order plan is available here.

Planned workflows:

TGA (KAPA + Twist) - 250 ng input amount (one library that splits into two different target enrichment steps): Exome - 200 M r-p GMS myeloid: 40 M r-p

WGS (Watchmaker Genomics DNA PCR free library prep) - 150 ng input amount Normal samples >30X (WGSPCFR400) Tumor samples >120X (WGSWPFS140)

Information/Articles on TMB

How to improve the calculation

TMB standardization by alignment to reference standards: Phase II of the Friends of Cancer Research TMB Harmonization Project.

Screenshot 2023-08-31 at 08 48 34

Documentation

[ ] For Customers, Include the description of how TMB is calculated and related references in the balsamic readthedocs

Suggested approach

A few sentences about the intended solution

Considered alternatives

Can be closed when

Link the issues needed to be closed for this to be implemented

Blockers

Anything preventing this from happening?

pbiology commented 1 year ago

We need samples with high and low TMB. Should be possible to get from Teresita

mathiasbio commented 1 year ago

Just adding this article: https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-022-01348-z which Saliendra from Lund added during the last GMS-BT meeting which focused on TMB.

pbiology commented 1 year ago

Is this feature still blocked? And if so, by what?

khurrammaqbool commented 2 months ago

As seen in the tables above, the computed scores differ from the expected. An investigation into this revealed samples mixup due to unclear labelling, so new set of standards were ordered and are now in sequencing stage.