ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
144 stars 18 forks source link

simulation analysis #275

Closed ShixiangWang closed 3 years ago

ShixiangWang commented 4 years ago

Construct a series of functions to do data simulation analysis

https://github.com/ShixiangWang/sigflow/issues/16

ShixiangWang commented 4 years ago

Application of SigProfiler and SignatureAnalyzer to synthetic data Our goal was to evaluate SignatureAnalyzer and SigProfiler on realistic synthetic data to identify any potential limitations of these two methods. SignatureAnalyzer and SigProfiler were tested on 11 sets of synthetic data, encompassing a total of 64,400 synthetic samples, in which known signature profiles were used to generate catalogues of synthetic mutational spectra. We operationally defined ‘realistic’ data as those based on the characteristics of either SignatureAnalyzer’s or SigProfiler’s analysis of the PCAWG genome data. SignatureAnalyzer’s reference signature profiles were based on COMPOSITE signatures, consisting of 1,536 types of strand-agnostic SBSs in pentanucleotide context, 78 types of DBSs and 83 types of small indels, for a total of 1,697 mutation types. SigProfiler’s reference analysis was based on strand-agnostic SBSs in the context of one 5′ and one 3′ base. For each test, we generated two sets of realistic data: SigProfiler-realistic (based on SigProfiler’s reference signatures and attributions) and SignatureAnalyzer-realistic (based on SignatureAnalyzer’s reference signatures and attributions), as well as two other types of data that involved using SignatureAnalyzer profiles with SigProfiler attributions and vice versa. A detailed description of each of the 11 sets of synthetic data and the results from applying SigProfiler and SignatureAnalyzer are provided in Supplementary Note 2.

ShixiangWang commented 4 years ago

https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-020-1943-3/MediaObjects/41586_2020_1943_MOESM4_ESM.pdf

ShixiangWang commented 4 years ago

参考

ShixiangWang commented 4 years ago

找到几个相关的正在开发的包

ShixiangWang commented 4 years ago

PCAWG work 合成数据考虑三个因素:

  1. 癌症类型 t 带 signature s 的比例
  2. 类型 t 带 signature s 样本的贡献(log10)均值
  3. 2 里面的标准差
ShixiangWang commented 4 years ago

先构建一些简单的 simulation 函数,以 sim_ 开头

ShixiangWang commented 3 years ago

加了两个数据生成的函数,后续有需要再弄这一块 https://github.com/ShixiangWang/sigminer/commit/63d1bb21c4cef8988837d1d6d06187d518d1f7b8