Closed clersdom closed 4 years ago
@clersdom
The sigminer.copynumber.max
will set a maximum copy number threshold for data, e.g. if you data contains a segment with copy number >100, then set it to 20
will reset this value to 20. This is used to avoid outliers. But if you just want to keep copy number values as what it is, you can set a big value to it.
Note, for male samples, copy number in X and Y will time 2 to avoid creating fake deletion signals in copy number value distribution.
They are reasons why I created these two options.
Right, makes sense. So only when I have a mixture of male-female, the sigminer.copynumber.max
will allow to avoid outliers.
In case that I know that my samples do not harbour many Copy number alterations, I guess I could be more lenient in here (like setting 40L)?
As a separate issue, regarding the show_sig_profile
normalize option, I understand that when I use "row" it is showing which of the 8 features contributes more to a signature, but when using the "feature" option, how is the normalization done then?
If I see similar contributions of a feature to a signature when I scale by "row", could I consider decreasing the number of signatures as well?
Many thanks!
@clersdom yes, of course.
For the normalization question, when feature
is selected, row normalization is done for each feature in each signature.
Let me use the following signature profile in README for illustration. The sum of components in feature SS is 1, same for other features. You are okay to use 'row' normalization, but 'feature' normalization is recommended for copy number signatures. Image that you have many samples, most samples may have few breakpoints (CNV) in most of chromosomes, this will result many numbers of component with 0 breakpoint (i.e. the first bar in the following plot), then you will see many components have very low bar heights in the plot. You can take a look at your data and try the two normalization methods to understand why I create this normalization option.
Perfect, thanks a lot @ShixiangWang
Hi, Many thanks for this tool!
I am using sigminer to identify copy number signatures from segmented data, and I would like to account for the gender of the samples to do so. In this case I think I need to generate a data frame with 2 columns ("sample" and "sex"), since I have both male and females, but I am not sure about what value I should use in the 'sigminer.copynumber.max option'. In your manual you used 20L- what is this standing for?
options(sigminer.sex = "male", sigminer.copynumber.max = 20L)
Thanks