getzlab / SignatureAnalyzer

Updated SignatureAnalyzer-GPU with mutational spectra & RNA expression compatibility.
MIT License
71 stars 21 forks source link

Forcing signatures #22

Closed mishugeb closed 4 years ago

mishugeb commented 4 years ago

Hello, Can we force a specific number of signatures to extract? For example, can we extract exactly 5 signatures from a Matrix?

Thanks

shankara-a commented 4 years ago

Hi,

You could force a specific number of signatures by running vanilla NMF, which requires a set K. I can recommend sci-kit learn or http://nimfa.biolab.si to just run the decomposition. The signatureanalyzer method is a bayesian variant to identify the appropriate number of signatures to avoid the overfitting of vanilla NMF.

On a different branch, I have a supervised version of signatureanalyzer where you can provide a fixed W-matrix and run the decomposition to learn the H matrix. This might be slightly closer to what you are looking for if you have set of signatures you are interested in and have a low number of samples.

mishugeb commented 4 years ago

Thanks for the quick response! I am not looking to decompose with a known W-matrix. I am just looking for extracting the de-novo W-matrix that has a fixed number of bases/signatures. I just want to influence the decision of defining how many signatures are there in the dataset. For example, SignatureAnalyzer may think there are 7 signatures, but I think there are 6 signatures and I want to extract 6 signatures. Is this possible?

shankara-a commented 4 years ago

As stated above, you should not use a bayesian method to fix the number of signatures. You should use an implementation of NMF that requires a fixed K.

You can set up your input X to the nmf method as your spectra extracted from the maf (we have examples here), run the decomposition with sci-kit learn or nimfa (linked above), and then map these back to cosmic signatures. For the last step, we have the signatures here and you can compute cosine similarity between the extracted signatures (W matrix) to the cosmic signatures to map them back.