Closed onebeingmay closed 2 years ago
Hi @onebeingmay,
Each sample is a row in the activities matrix. Not all signatures will be present in each sample, so this is not unexpected. You can learn more about the output at the wiki page for SigProfilerExtractor.
Best, Mark
Thanks for the information Mark @mdbarnesUCSD ! I reviewed my pipeline and still suspect I may have done something wrong:
sig.sigProfilerExtractor('vcf', 'result', 'data', reference_genome='GRCh38', cpu=24, minimum_signatures=5, maximum_signatures=20, exome=True)
. This time I got fewer zeros.k=10 k=13 k=15 k=17
k=10 1.0000000 0.4861054 0.3489011 0.5399221 k=13 0.4861054 1.0000000 0.3201084 0.3716092 k=15 0.3489011 0.3201084 1.0000000 0.5588446 k=17 0.5399221 0.3716092 0.5588446 1.0000000
My question are: 1. is "exome=True" suitable for whole-exome sequencing? 2. why the correlation between different k is poor? Thank you Wenbin
Hello SigProfilerExtractor team, Thank you for developing such a nice tool! I am trying to extract COSMIC Signatures from whole exome sequencing data (vcfs downloaded from TCGA), but in the result activity table "COSMIC_SBS96_Activities_refit.txt" some of the signature activities are only detected in very few number of samples. Here are the first several rows and columns (each row is a sample):
Some of the samples don't have some signature activity at all (appearing 0). While this may be real I suspect I made some mistakes. Here is my code for running the program:
Any idea would be greatly appreciated! Wenbin