SATS is a novel method developed for the accurate identification of mutational signatures in tumors sequenced using targeted panels. Unlike tools developed for whole-exome or whole-genome sequencing, SATS is specifically designed to address the unique challenges of targeted sequenced tumors. It encompasses the detection of de novo signatures, mapping these to reference TMB signatures, estimating signature activities, and calculating signature burdens.
For more information please refer to the user guide.
To install SATS directly from GitHub:
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("binzhulab/SATS/source")
Alternatively, download the package and follow the steps below. Download SATS_0.0.8.tar.gz (for Unix) or SATS_0.0.8.zip (for Windows, R version >= 4.1). To install SATS on Unix, enter the command from a Unix prompt:
R CMD INSTALL SATS_0.0.8.tar.gz -l path_to_install_package
Alternatively, SATS_0.0.8.tar.gz (for Unix) or SATS_0.0.8.zip (for Windows, R version >= 4.1) from the Github page are available and one may use the following commands:
install.packages("./SATS_0.0.8.tar.gz", repos = NULL, type = "source")
install.packages("./SATS_0.0.8.zip", repos = NULL, type = "win.binary")
Once the installation is successful, it can be loaded in R by calling
library(SATS)
a. The workflow starts with summarizing somatic mutations identified through targeted sequencing, including single base substitutions (SBS), into a mutation type matrix $\mathbf{V}$.
In addition, SATS requires a panel context matrix $\mathbf{L}$ that specifies the number of trinucleotide contexts for individual panels.
SATS is based on a Poisson Nonnegative-Matrix Factorization (pNMF) model, approximating $\mathbf{V}$ by $\mathbf{L} \circ \mathbf{W} \times \mathbf{H}$
(i.e., $\mathbf{V} \approx \mathbf{L} \circ \mathbf{W} \times \mathbf{H}$, where $\circ$ denotes the element-wise product and $\times$ represents the matrix multiplication operator.
b. The analysis procedure of SATS involves signature detection for a patient cohort and signature refitting for individual patients.
In this illustrative example, SATS initially identifies de novo tumor mutation burden (TMB) signature 1 and 2 for a patient cohort, and then maps them to reference TMB signatures 1, 2/13 and 5.
Subsequently, SATS carries out signature refitting for 6 patients (e.g., Pt.1, Pt.2, …, Pt.6), estimating activities of the mapped reference TMB signatures and the expected number of mutations attributed to each signature, namely signature burden.
For instance, the activities of SBS1, SBS2/13 and SBS5 for patient 3 (Pt.3) are 0.27, 0.84 and 0.18.
Additionally, we estimate 0.67, 1.16 and 3.17 SBS attributed to signature SBS1, SBS2/13 and SBS5, respectively.
The package includes a simulated dataset:
SimData
with corresponding names: $\mathbf{V}$ (SimData$V
), $\mathbf{L}$ (SimData$L
) as follows. data(SimData, package = "SATS")
SimData$V[1:6, 1:6]
SimData$L[1:6, 1:6]
L_matrix_generation()
function and two example datasets (Panel_Info_1_assay.txt for a single panel information and Panel_Info_2_assays.txt for multiple panels).BSgenome.Hsapiens.UCSC.hg19
will be loaded in L_matrix_Generation.R script). For the HG38 reference genome, BSgenome.Hsapiens.UCSC.hg38
package may be installed and loaded.L_matrix_generation(genomic_information, Types)
where genomic_information
contains Chromosome
, Start_Position
, End_Position
, SEQ_ASSAY_ID
as belows:
> Panel_1
Chromosome Start_Position End_Position SEQ_ASSAY_ID Hugo_Symbol
1 9 133738302 133738491 UHN-48-V1 ABL1
2 9 133747476 133747664 UHN-48-V1 ABL1
3 9 133748157 133748327 UHN-48-V1 ABL1
...
Chromosome
contains chromsome number where Start_Position
and End_Position
columns are start and end positions of targeted panel.SEQ_ASSAY_ID
distinguishes different panels consisting of the resulting $\mathbf{L}$ matrix (column names in the result).Chromosome
, Start_Position
, End_Position
, SEQ_ASSAY_ID
as in the above example (Hugo_Symbol
is optional and not required to use L_matrix_generation()
function).L_matrix_generation()
specifies mutation type order as either one of "COSMIC"
or "signeR"
where
"COSMIC"
corresponds to the order from the COSMIC database v3.2 and"signeR"
corresponds to the order from the signeR
package
> L_matrix_generation(Panel_1) #not working
> L_matrix_generation(Panel_2, Types = "COSMIC")
GRCC-CP1 UHN-48-V1
A[C>A]A 0.000883 0.001487
A[C>A]C 0.000656 0.001120
A[C>A]G 0.000278 0.000426
...
> L_matrix_generation(Panel_2, Types = "signeR")
GRCC-CP1 UHN-48-V1
C>A:ACA 0.000883 0.001487
C>A:ACC 0.000656 0.001120
C>A:ACG 0.000278 0.000426
...
signeR()
function and its mutation type order should be the same as input matrix (Mutation catalog matrix $\mathbf{V}$; see Section 3). Thus we highly recommend to confirm that both $\mathbf{V}$ and $\mathbf{L}$ matrices have the same mutation type order corresponding to one of COSMIC database v3.2 or signeR
package (both have the same order but have different expression) to conduct the consistent analysis.library(signeR)
signeR_re <- signeR(M=V_sum, Opport=L_sum, nlim=c(1,5))
signeR_re$Phat
V_sum
and L_sum
are used as inputs for signeR()
function.
See user guide for details.signeR()
is done, the optimal signature profiles are provided in signeR_re$Phat
which may be used for the next mapping step.MappingSignature()
function.
W_hat <- signeR_re$Phat
MappedSig <- MappingSignature(W_hat = W_hat, W_ref = RefTMB$SBS_W)
MappedSig
W_hat
is a de novo TMB signatures from signeR (signeR_re$Phat
) or any other signature analysis tool.W_ref
is for the reference TMB signature profiles which will be mapped to. In this example code, we used the COSIMC reference TMB signature profiles (stored in RefTMB$SBS_W
).MappedSig
contains mapped reference TMB signatures, e.g., COSMIC SBS1, SBS2/13, SBS4, SBS5, SBS40, and SBS89 signatures (MappedSig$Reference
), with frequencies (MappedSig$freq
) of coefficients greater than 0.1 out of 100 cross-validated repetitions. EstimateSigActivity()
function.
W_star <- as.matrix(RefTMB$SBS_W[,SBS.list])
H_hat <- EstimateSigActivity(V = SimData$V, L = SimData$L, W = W_star)
H_hat$H
V
is the mutation type matrix $\mathbf{V}$, L
is the panel context matrix $\mathbf{L}$ and W
is the mapped reference TMB signatures $\mathbf{W}^*$.H_hat$H
is the estimated activity matrix of size $K \times N$, where $K$ is the number of signatures given in W
.CalculateSigExpectancy()
function.
SigBdn <- CalculateSigExpectancy(L = SimData$L, W = W_star, H = H_hat$H)
L
is the panel context matrix $\mathbf{L}$, W
is the mapped reference TMB signatures $\mathbf{W}^*$ and H
is the the estimated activity matrix $\widehat{\mathbf{H}}$.> round(SigBdn[, 1:5], 2)
Sample1 Sample2 Sample3 Sample4 Sample5
SBS1 4.79 0.00 0 0.00 0.00
SBS2_13 0.00 0.00 1 1.08 0.82
SBS4 7.04 0.00 0 0.00 0.77
SBS5 0.21 0.00 0 2.93 0.07
SBS40 1.95 3.87 0 0.99 3.34
SBS89 0.00 2.13 0 0.00 0.00
RefTMB$SBS_W
and RefTMB$SBS_refSigs
contain the COSMIC SBS TMB signature profiles and the list of cancer specific SBS signature names, respectively.
Similarly, RefTMB$DBS_W
and RefTMB$DBS_refSigs
contain the COSMIC DBS TMB signature profiles and the list of cancer specific DBS signature names.SimData$SingleTumorEx
(the singleV
and singleV
contain mutation counts and sequencing context respectively).
SBS.list <- RefTMB$SBS_refSigs[RefTMB$SBS_refSigs$cancerType == "Skin Cancer or Melanoma", "COSMIC"]
W_star <- as.matrix(SimData$W_TMB[,SBS.list])
## Estimate activity
V1 <- SimData$SingleTumorEx[, 1, drop = FALSE]
L1 <- SimData$SingleTumorEx[, 2, drop = FALSE]
H_hat <- EstimateSigActivity(V = V1, L = L1, W = W_star)
## Estimate burden
SigBdn <- CalculateSigExpectancy(L = L1, W = W_star, H = H_hat$H)
EstimateSigActivity()
and CalculateSigExpectancy()
work to estimate signature activities and signature burdens.SATS provides a comprehensive approach for analyzing mutational signatures in targeted sequenced tumors, addressing the limitations of existing tools and providing detailed steps for analysis in various scenarios. This work is under the license of CC BY-NC 4.0.