getzlab / SignatureAnalyzer

Updated SignatureAnalyzer-GPU with mutational spectra & RNA expression compatibility.
MIT License
71 stars 21 forks source link

De novo signature extraction #33

Closed andreyurch closed 2 years ago

andreyurch commented 3 years ago

Dear developers,

I apologise for probably a simple question, but I was unable to find an answer in the Manual. I have a dataset of skin cancers and want: 1.Extract de novo signatures 2.Get optimal number of signatures for my dataset. 3.Get a table with the signatures of optimal number. 4.Get a table with exposure of signatures for each sample.

I run with the 96-chanell spectrum: signatureanalyzer -n 10 \ -t spectra \ --objective poisson \ --prior_on_H L1 \ --prior_on_W L1 \ 96_matrix_skin.txt

But how can I receive from output the optimal number of signatures, extracted signatures, exposures for my dataset?

Best regards, Andrey

yoakiyama commented 3 years ago

Hi Andrey,

Sorry for the slow response. There should be a few outputs from SignatureAnalyzer that would give you this information. All of this information can be found in the nmf_output.h5 file.

  1. k_dist.pdf is an easy way to visualize the distribution of the number of signatures you get from your 10 runs in a histogram. If you want more information on each run, you can view the number of signatures (K) and other summary statistics using pd.read_hdf(<nmf_output.h5 path>, 'aggr')
  2. You can view the extracted signatures and their mutational landscapes in the signature_contributions.pdf barplot. You can view the W matrix in python using pd.read_hdf(<nmf_output.h5 path> , 'W') for fractional contributions and pd.read_hdf(<nmf_output.h5 path>, 'Wraw') for raw contributions
  3. You can view the exposures in the signature_stacked_barplot.pdf file. The actual H matrix can be extracted from the nmf_output.h5 pd.read_hdf(<nmf_output.h5 path>, 'H')

Hope this helps. Please let me know if you have any issues

Best regards, Yo