🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
Just noticed that when using the sig_tally() function with the Wang method, the row names in the resulting NMF matrix (tally_wang$nmf_matrix) are unexpectedly NULL. This behavior is inconsistent with the expected outcome, where the row names would ideally be the sample IDs, as observed when using the S (Steele) method. Reprex below:
# Load the sigminer library
library(sigminer)
#> Registered S3 method overwritten by 'sigminer':
#> method from
#> print.bytes Rcpp
#> sigminer version 2.3.2
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.
# Load a toy segmentation table included with the sigminer package
load(system.file("extdata", "toy_segTab.RData",
package = "sigminer", mustWork = TRUE
))
# Set a seed for reproducibility
set.seed(1234)
# Add a new column 'minor_cn' with random values of either 0 or 1
segTabs$minor_cn <- sample(c(0, 1), size = nrow(segTabs), replace = TRUE)
# Subset the segmentation table for a single sample
singleSampleSegTabs <- subset(segTabs, sample == "TCGA-A8-A07S-01A-11D-A036-01")
# Read the copy number data for the single sample
cn <- read_copynumber(singleSampleSegTabs,
seg_cols = c("chromosome", "start", "end", "segVal"),
genome_measure = "wg", complement = TRUE, add_loh = TRUE
)
#> ℹ [2024-09-05 14:58:13.990813]: Started.
#> ℹ [2024-09-05 14:58:13.995483]: Genome build : hg19.
#> ℹ [2024-09-05 14:58:13.996023]: Genome measure: wg.
#> ℹ [2024-09-05 14:58:13.996513]: When add_loh is TRUE, use_all is forced to TRUE.
#> Please drop columns you don't want to keep before reading.
#> ✔ [2024-09-05 14:58:14.002538]: Chromosome size database for build obtained.
#> ℹ [2024-09-05 14:58:14.003157]: Reading input.
#> ✔ [2024-09-05 14:58:14.003672]: A data frame as input detected.
#> ✔ [2024-09-05 14:58:14.004327]: Column names checked.
#> ✔ [2024-09-05 14:58:14.005043]: Column order set.
#> ✔ [2024-09-05 14:58:14.006148]: Chromosomes unified.
#> ✔ [2024-09-05 14:58:14.008141]: Value 2 (normal copy) filled to uncalled chromosomes.
#> ✔ [2024-09-05 14:58:14.009723]: Data imported.
#> ℹ [2024-09-05 14:58:14.010245]: Segments info:
#> ℹ [2024-09-05 14:58:14.010774]: Keep - 45
#> ℹ [2024-09-05 14:58:14.011272]: Drop - 0
#> ✔ [2024-09-05 14:58:14.011938]: Segments sorted.
#> ℹ [2024-09-05 14:58:14.012445]: Adding LOH labels...
#> ℹ [2024-09-05 14:58:14.013251]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-09-05 14:58:14.018461]: 39 segments left after joining.
#> ✔ [2024-09-05 14:58:14.019082]: Segmental table cleaned.
#> ℹ [2024-09-05 14:58:14.019575]: Annotating.
#> ✔ [2024-09-05 14:58:14.024584]: Annotation done.
#> ℹ [2024-09-05 14:58:14.02513]: Summarizing per sample.
#> ✔ [2024-09-05 14:58:14.033565]: Summarized.
#> ℹ [2024-09-05 14:58:14.034175]: Generating CopyNumber object.
#> ✔ [2024-09-05 14:58:14.034964]: Generated.
#> ℹ [2024-09-05 14:58:14.03548]: Validating object.
#> ✔ [2024-09-05 14:58:14.036014]: Done.
#> ℹ [2024-09-05 14:58:14.036588]: 0.046 secs elapsed.
# Tally signatures using the Steele method
tally_steele <- sigminer::sig_tally(cn, method = "S", keep_only_matrix = FALSE)
#> ℹ [2024-09-05 14:58:14.253776]: Started.
#> ℹ [2024-09-05 14:58:14.25461]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!
#> ✔ [2024-09-05 14:58:14.260849]: Matrix generated.
#> ℹ [2024-09-05 14:58:14.261393]: 0.008 secs elapsed.
# ISSUE: The row names of the NMF matrix for the Wang tally are NULL,
# but they are expected to be the sample IDs.
rownames(tally_wang$nmf_matrix)
#> NULL
# EXPECTED BEHAVIOR: The row names for the Steele tally include the sample names as expected.
rownames(tally_steele$nmf_matrix)
#> [1] "TCGA-A8-A07S-01A-11D-A036-01"
Hi @ShixiangWang,
Hope you're well.
Just noticed that when using the sig_tally() function with the Wang method, the row names in the resulting NMF matrix (tally_wang$nmf_matrix) are unexpectedly NULL. This behavior is inconsistent with the expected outcome, where the row names would ideally be the sample IDs, as observed when using the S (Steele) method. Reprex below:
Created on 2024-09-05 with reprex v2.1.0