AlexandrovLab / SigProfilerAssignment

Assignment of known mutational signatures to individual samples and individual somatic mutations
BSD 2-Clause "Simplified" License
46 stars 10 forks source link

Incorrect reconstruction #15

Closed hylkedonker closed 2 years ago

hylkedonker commented 2 years ago

Hi,

I have a mutation spectrum X and given COSMIC's mutational signature H I want to compute W so that X=WH. In otherwords, I want to compute W=X H^{-1}. If I understand it correctly, this should be the purpose of this package, right?

I conducted a small experiment to see if everything works as I expected. I choose some W to check if SigProfilerAssignment can correctly reconstruct it

X = W_true H --- SigProfilerAssignment ---> W_reconstr

and check if W_true is close to W_reconstr. Unfortunately, it is not even close. Probably I misinterpreted something of the program, e.g., what files to use (I assumed Assignment_Solution_Activities.txt contains W_reconstr). Could you help me out to pinpoint what went wrong?

Here is a minimal example that reproduces my problem:

from numpy import linalg, random
from pandas import DataFrame, read_csv
from SigProfilerAssignment import Analyzer as Analyze

signature_file = "COSMIC_v3_SBS_GRCh37_noSBS84-85.txt"
H = read_csv(signature_file, sep="\t", index_col=0)

activations = random.randint(0, 2, size=(3, H.shape[1]))
W_true = DataFrame(activations, columns=H.columns)

X = H @ W_true.transpose()
X.to_csv("/tmp/spectrum.tsv", sep="\t")

Analyze.cosmic_fit(
    samples="/tmp/spectrum.tsv",
    output="/tmp/activations/",
    signatures=None,
    signature_database=signature_file,
    genome_build="GRCh37",
    verbose=False,
)

W_reconstr = read_csv(
    "/tmp/activations/Assignment_Solution/Activities/Assignment_Solution_Activities.txt",
    sep="\t",
    index_col=0,
)

Thanks in advance,

Hylke

rvangara commented 2 years ago

Hi Hylke,

The signatures are W and the activities of those signatures are H (Unless you have some transpose on mutation spectrum X.) In our Sigprofiler tools the mutational matrix or what you call a spectrum, X, has dimensions m x n i.e. context type x samples. Our signatures are W (context_type x number_of_signatures k) and H are activities (number_of_signatures k x Samples).

Please try again by using accurate Signatures and Activities and reopen the ticket if the issue persists.

Ravi.