Nik-Zainal-Group / signature.tools.lib

R package containing useful functions for mutational signature analysis
Other
80 stars 26 forks source link

maxRareSigsPerSample parameter not reflected in outputs #45

Closed timchu90 closed 2 years ago

timchu90 commented 2 years ago

Hi! Thanks for your work on this tool. I'm currently trying to run this tool on a set of Bladder cancer patients who've undergone Platinum chemo. As such I wanted both SBS31 and SBS35 to be accounted for in the set of rare signatures when being fitted.
when looking at the candidateRareSigs.tsv output, there are cases where it seems like the solution with multiple rare mutational signatures is selected for. But when looking at the pdf output, it's not selected for and only SBS31 is in the solution. Any insight as to if this is expected or how to properly use this parameter would be much appreciated. Thanks! image

andreadega commented 2 years ago

Hi there,

Before I label this as bug, can I double check that things are working properly with you? My first question would be: how are you plotting this? I assume you did something like this:

resObj <- FitMS(...)
plotFitMS(resObj,outfolder)

or:

resObj <- FitMS(...)
plotFitResults(resObj,outfolder)

In both cases, the outfolder should contain plots for all solutions, organised in folders like selectedSolution and otherSolutions. If you check the selectedSolution folder you will find the details of what happens in the selected solution SBS35:SBS31. My guess is that perhaps both SBS31 and SBS35 were used in the selected solution fit, yet because of the threshold in the post filter one of them was removed.

If you can give me more info about what is in the selectedSolution folder perhaps I can provide more advice.

timchu90 commented 2 years ago

Thanks for the quick reply! I am running this using the shell script with the following command: ~/tools/signature.tools.lib-2.1.2/scripts/signatureFit -x vcf_input.txt -o ./output_maxRare/ -b -O Bladder -e hg38 -k 2 I've included the outputs in the selected solution in the image below:

image
andreadega commented 2 years ago

Thanks for sharing your results.

Perhaps I understand now what is going on. FitMS will add a rare signature if it significantly improves the fit w.r.t. using common signature only. When you specify to allow 2 rare signatures (parameter -k 2), FitMS will add the second signature only if the fit with two rare signatures significantly improves the fit with w.r.t. using only one rare signature. This can happen in two directions, either SBS31 -> SBS35 (solution SBS31:SBS35) or SBS35 -> SBS31 (solution SBS35:SBS31). Currently, we assume that the solution with 2 rare signatures is valid if just one of the two directions is found. Though this example makes me wonder whether we should be more strict and request that all directions need to be found. What I think is happening here, is that adding SBS35 improves the fit w.r.t. common alone, and then adding SBS31 improves the fit once more but kicks out SBS35. You can see in your opening post that the cosine similarity of the reconstruction of SBS31 alone and SBS35:SBS31 is the same, so SBS35 is not necessary, and not likely present in this sample.

This suggests we could improve FitMS by adding an option to request that all directions need to be found, in this case both SBS31:SBS35 and SBS35:SBS31 solutions, in order for the combination to be considered. Or we could give a warning about the combination solution having very similar cosine similarity to solutions with less rare signatures.

In any case, the output provided by FitMS at the moment is quite clear, there are only three signatures detectable in this sample, SBS31, SBS2 and SBS13.

Finally, as it is not unexpected that FitMS might choose the incorrect solution (think for example noisy data, or rare signatures that are very similar), we provide a function called fitMerge, which can be used to replace the choice of solution (see FitMS manual). Unfortunately, I haven't yet written the command line script version for it, though it is in my to-do list.

timchu90 commented 2 years ago

Thank you for such a detailed explanation!