ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
144 stars 18 forks source link

bp_extract_signatures tunning #338

Closed tangwei1129 closed 3 years ago

tangwei1129 commented 3 years ago

Hi Shixiang, when I am trying to use the best practice with 1000 bt x n_nmf_run on my WES mutational signature calling, usually I got way more signatures than sigprofiler. is there any parameters I should tune a little to get stringent results? Thanks, Wei

ShixiangWang commented 3 years ago

Could you show your survey plot and the suggested signature number from sigprofiler? Maybe I can take a look.

tangwei1129 commented 3 years ago

image the bp_extract_signature suggested 10 signatures, see above plot. but sigprofiler suggest 3. The data is based ~170 WES tumor-normal paired breast cancer.

bp_extract_signatures( mt_tally$SBS_96, range = 3:13, n_bootstrap = 20, n_nmf_run = 50, cores = 16, seed = 123456 )

sigprofiler_extract( mt_tally$SBS_96, 'sigprofiler_SBS96', range = 3:13, nrun = 20L, refit = FALSE, refit_plot = FALSE, is_exome = F, cores = 16L, genome_build = "hg38", sigprofiler_version = "1.1.0" )

ShixiangWang commented 3 years ago

It seems 10 signatures are okay here. Could you try https://shixiangwang.github.io/sigminer/reference/sig_fit_bootstrap_batch.html,it fits cosmic signatures and report sbs signatures activity and you can look the distribution of signature activity to check the number of signature

tangwei1129 commented 3 years ago

image when I expand the range to 20, then the number of signatures went up to 15...so I guess there is something need to tune to balance the sensitivity.

yes, i will try just fit to SBS

ShixiangWang commented 3 years ago

@tangwei1129 Thanks for your feedback, this integrated score is only tested in several simulated datasets and it seems not work well in real data. I will try to tune and modify it in the future. :) In general, you should pay attention to the measures silhouette (the bigger, the better) and error (the smaller, the better). And, I suggest you set the range starts from 2 from de novo siganture discovery.

If you are working on identifying known COSMIC signatures, signature fitting is always a good way.

tangwei1129 commented 3 years ago

thank you. as some papers indicate, before fitting to exist SBS, we should do de nov first, and fitting to the ones from de novo instead of fitting to all the signature. so as I am doing now. extracting and then fitting.

Shixiang Wang notifications@github.com 于2021年2月15日周一 下午11:16写道:

@tangwei1129 https://github.com/tangwei1129 Thanks for your feedback, this integrated score is only tested in several simulated datasets and it seems not work well in real data. I will try to tune and modify it in the future. :) In general, you should pay attention to the measures silhouette (the bigger, the better) and error (the smaller, the better). And, I suggest you set the range starts from 2 from de novo siganture discovery.

If you are working on identifying known COSMIC signatures, signature fitting is always a good way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ShixiangWang/sigminer/issues/338#issuecomment-779569780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4EHUYR6DDJQYSHPSAD7UTS7HWRZANCNFSM4XRYVYQQ .

ShixiangWang commented 3 years ago

I am closing this issue, feel free to reopen it.