signature and reference version problem

Aimee2018 commented 1 year ago

Hi, when I used sigminer, I met some problems.

How do you define the similarity between the sigminer signatures and the cosmic signatures?
How to get the best and proper signature number?
It is not available to the hg38 reference gemome.

mats <- mt_tally <- sig_tally( laml, ref_genome = "BSgenome.Hsapiens.UCSC.hg38", useSyn = TRUE, mode = "ALL" )

ShixiangWang commented 1 year ago

Hi @Aimee2018

The similarity is defined as the cosine similarity between two vectors (signatures). This is consistent with the convention of this field.
The proper signatures can be obtained through many approaches, this is well described in https://shixiangwang.github.io/sigminer-book/basic-workflow.html#de-novo-signature-discovery. You can find the function documentation at https://shixiangwang.github.io/sigminer/reference/index.html.
To use BSgenome.Hsapiens.UCSC.hg38, please install the package with BiocManager::install("BSgenome.Hsapiens.UCSC.hg38") firstly.

ShixiangWang commented 1 year ago

@Aimee2018 Any feedback?

Aimee2018 commented 1 year ago

Hi，Thanks a lot for you detailed reply！Meanwhile, I still have some questions and problems to consult you.

I have read your paper(Copy number signature analysis tool and its application in prostate cancer reveals distinct mutational processes and clinical outcomes). I have some questions: 1) How do you transform 8 copy number features into 80 copy number components. why is it 80? 2) In your paper, a clear claim: each of the predefined 80 copy number components has clear and fixed biological meaning. How do you find and define the fixed biological meaning connected to your 80 copy number components? 3) In the "Mutational processes underlying copy number signatures" part, you give a TDP score. what is the meaning of high or low score? How do you know it is a tandem duplication just according to CNV data?
Can sigminer support detect the HRD or not?
what is the differences between sigminer and signature.tools.lib (https://github.com/Nik-Zainal-Group/signature.tools.lib)

Thank you for you time!

------------------ 原始邮件 ------------------ 发件人: "ShixiangWang/sigminer" @.>; 发送时间: 2022年10月17日(星期一) 晚上9:50 @.>; @.**@.>; 主题: Re: [ShixiangWang/sigminer] signature and reference version problem (Issue #419)

@Aimee2018 Any feedback?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

ShixiangWang commented 1 year ago

@Aimee2018 Thanks for your questions, please read the following reply.

Thanks for your reading and interest in my published work. a. To understand the transformation, let's take copy number as an example, copy number segments with copy number value 1, 2, and 4 is generally more abundant than copy number 8, 10, and 20, right? So we manually classify the copy number from a value in [0, Inf] to a value belong to {0, 1, 2, 3, 4, 4-8, 8+} or similar. It can well describe the copy number feature with a limited number of classes (components). We did the similar operation on 7 other features. The resulting number of copy number component is simply the summation of number of copy number components for all features. Actually, if you like, you can define your own copy number features, sigminer provides such approach, you can check https://github.com/ShixiangWang/sigminer/blob/master/data-raw/CN-features.R to know how the 80 components generated. You can provide you own setting by modifying the option at https://github.com/ShixiangWang/sigminer/blob/95958d14821f862c52e7141daba17c2ba20daf7d/R/sig_tally.R#L149 b. Given the components below, you can easily understand that BP10MB[2] indicates 2 breakpoints per 10 MB region, this is a fixed meaning however, this cannot be reflected by the previous approach (Nat. Genetics, 2018), which uses a flexmix model to extract the components, if it labels a component as BP10MB[2], it does not mean the same thing, instead, it may represent a poison distribution with mean 1.5, and if you input different data, the parameter of this distribution may change, and even the number of components will change.

> sigminer::CN.features
    feature         component label  min max
 1:  BP10MB         BP10MB[0] point    0   0
 2:  BP10MB         BP10MB[1] point    1   1
 3:  BP10MB         BP10MB[2] point    2   2
 4:  BP10MB         BP10MB[3] point    3   3
 5:  BP10MB         BP10MB[4] point    4   4
 6:  BP10MB         BP10MB[5] point    5   5
 7:  BP10MB        BP10MB[>5] range    5 Inf
 8:   BPArm          BPArm[0] point    0   0
 9:   BPArm          BPArm[1] point    1   1
10:   BPArm          BPArm[2] point    2   2
11:   BPArm          BPArm[3] point    3   3
12:   BPArm          BPArm[4] point    4   4
13:   BPArm          BPArm[5] point    5   5

This was reflected in a supp figure:

c. For scoring related question, please check https://shixiangwang.github.io/sigminer/reference/scoring.html.

In sigminer, no explicit function to detect HRD. In the mutational signature field, SBS3 is commonly used for HRD detection, and some tools like SigMA has been proposed. You can also use sigminer to get the activity of SBS3 and then infer the HRD. If you want to use copy number signature for HRD detection, you may need to explore by yourself.
signature.tools.lib is a tool for reference signature fitting, which is also covered by sigminer.

ShixiangWang commented 1 year ago

I am closing it now. Feel free to reopen if you have further questions.

ShixiangWang / sigminer

signature and reference version problem #419