Account for sample gender to extract CN signatures

ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html

https://shixiangwang.github.io/sigminer/

Other

147 stars 19 forks source link

Account for sample gender to extract CN signatures #284

Closed clersdom closed 4 years ago

clersdom commented 4 years ago

Hi, Many thanks for this tool!

I am using sigminer to identify copy number signatures from segmented data, and I would like to account for the gender of the samples to do so. In this case I think I need to generate a data frame with 2 columns ("sample" and "sex"), since I have both male and females, but I am not sure about what value I should use in the 'sigminer.copynumber.max option'. In your manual you used 20L- what is this standing for?

options(sigminer.sex = "male", sigminer.copynumber.max = 20L)

Thanks

ShixiangWang commented 4 years ago

@clersdom

The sigminer.copynumber.max will set a maximum copy number threshold for data, e.g. if you data contains a segment with copy number >100, then set it to 20 will reset this value to 20. This is used to avoid outliers. But if you just want to keep copy number values as what it is, you can set a big value to it.

Note, for male samples, copy number in X and Y will time 2 to avoid creating fake deletion signals in copy number value distribution.

They are reasons why I created these two options.

clersdom commented 4 years ago

Right, makes sense. So only when I have a mixture of male-female, the sigminer.copynumber.max will allow to avoid outliers. In case that I know that my samples do not harbour many Copy number alterations, I guess I could be more lenient in here (like setting 40L)?

As a separate issue, regarding the show_sig_profile normalize option, I understand that when I use "row" it is showing which of the 8 features contributes more to a signature, but when using the "feature" option, how is the normalization done then? If I see similar contributions of a feature to a signature when I scale by "row", could I consider decreasing the number of signatures as well?

Many thanks!

ShixiangWang commented 4 years ago

@clersdom yes, of course.

For the normalization question, when feature is selected, row normalization is done for each feature in each signature.

Let me use the following signature profile in README for illustration. The sum of components in feature SS is 1, same for other features. You are okay to use 'row' normalization, but 'feature' normalization is recommended for copy number signatures. Image that you have many samples, most samples may have few breakpoints (CNV) in most of chromosomes, this will result many numbers of component with 0 breakpoint (i.e. the first bar in the following plot), then you will see many components have very low bar heights in the plot. You can take a look at your data and try the two normalization methods to understand why I create this normalization option.

clersdom commented 4 years ago

Perfect, thanks a lot @ShixiangWang