ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
141 stars 18 forks source link

Issue recording from email #441

Closed ShixiangWang closed 10 months ago

ShixiangWang commented 10 months ago

Hi Yuan,

Thanks for you interest and questions. I am happy to reply your questions one by one.

  1. I faked the "minor_cn" column because in the example dataset for illustration, "segTabs", no such data is available, so I mutated one column to generate data with minor_cn. The minor_cn column is important for analyzing in an allele-specific way, e.g., for generating the copy number signatures or calculating some overall measures like pLOH (by get_pLOH_score()), otherwise, it isn't important. Actually, if it is important for your analysis in sigminer, the package will reminder you that if you have no such column.

Here is an example, without the minor_cn,

colnames(segTabs) [1] "sample" "chromosome" "start" "end" "segVal"
get_pLOH_score(segTabs) Error in get_pLOH_score(segTabs) : Invalid input, it must contain columns: chromosome start end segVal minor_cn sample

Furthermore, you are right about your facets data, and you can also refer to https://github.com/XSLiuLab/PC_CNA_signature/blob/11acd714ee6eb3d0702cb2e05933b774c13b02b0/analysis/src/99-functions.R#L30 for reading and converting your facets result data to sigminer.

  1. For this point, I suggest you read the documentation of this function (by running ?read_copynumber in your R console), as it will says that for this option: genome_measure
    default is 'called', can be 'wg' or 'called'. Set 'called' will use called segments size to compute total size for CNA burden calculation, this option is useful for WES and target sequencing. Set 'wg' will use autosome size from genome build, this option is useful for WGS, SNP etc.

  2. Yeah, that's a good point, we collected the 19 reference signatures directly from the SigProfilerExtractor tool, it was only 19 reference signatures available.

?get_sig_db s11 <- get_sig_db("CNS_TCGA") s11

I know that the COSMIC signature database is updating the signature list (we had operations for handing this, https://github.com/ShixiangWang/sigminer/commit/76c4579fd3cfef037c8a4378622166dd603488b3), actually, you can obtain the data from COSMIC easily with the following commands:

sx = get_sig_db("latest_CN_GRCh37") The data is not available in local, obtain it from COSMIC: https://cancer.sanger.ac.uk/signatures/downloads/ Downloaded 8767 bytes...Transforming and saving to /Users/wsx/Library/R/sigminer/extdata/COSMIC_v3.3_CN_GRCh37.rds sx

When you are using functions like sig_fit(), which supports the sig_db option, you can set it to "latest_CN_GRCh37" to use this reference signature list.

I hope the information above can address your all questions. If you have further questions, I recommend you use the GitHub issue at https://github.com/ShixiangWang/sigminer/issues so I can track the questions and people have similar questions can benefit from the discussion.

Best, Shixiang