ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
147 stars 19 forks source link

signature decomposition of SigProfileExtractor vs. get_sig_similarity? #400

Closed amootta closed 2 years ago

amootta commented 2 years ago

Hi there, First of all wanted to say a huge thank you for creating this software! It is very useful and the documentation for it is amazing. I just have a question about the SigProfilerExtractor function, which results in a decomposed solution in the output folder. I had a look at the decomposition plots of my de-novo signatures and the COSMIC signatures that it identifies are completely different to those that are highlighted to be most similar when I use get_sig_similarity on the same dataset. My question is what is the difference between these two functions (decomposition vs similarity?), and which of these could I 'trust' more when trying to assess these signatures? Your explanation would be much appreciated. Thanks very much!

ShixiangWang commented 2 years ago

@amootta Hi, get_sig_similarity only calculates the correlation between two things (here is two matrices, one is your input and the other is COSMIC reference signature matrix). However, the decomposition result is the signature generated by SigProfiler.

I think this isn't your point. Could you put more details (with snapshots) and code so I can properly understand and answer your question?

amootta commented 2 years ago

Hi, thanks very much for your quick reply. Here is a snapshot of the decomposition plot made by sigprofiler, image As shown, it decomposes the de novo SBS96A signature from my dataset to the COSMIC signatures SBS5 and SBS39.

In comparison, when I use get_sig_similarity on the imported signature solution, by doing: sigprofiler_import('./IGHV_SBS_new_final_norefit',order_by_expo = FALSE) -> IGHV_SBS simIGHV_SBS <- get_sig_similarity(IGHV_SBS$solution, sig_db='SBS', db_type='human-genome') I get this plot: image As you can see, Sig1 (SBS96A) is most similar to COSMIC signatures such as SBSB5, SBS40, while it is not at all similar to SBS39 though this is what SigProfiler decomposes it to. I am hence confused what each of these plots show? and what they mean? What is the difference between getting the 'decomposition' signatures vs. similar signatures?

Thanks again

ShixiangWang commented 2 years ago

@amootta Hi, I understand.

For 'decomposition', it means A can be generated by mixing B and C.

For 'similarity', here means calculating the similarity between A and B, A and C, respectively and found they are similar.