Closed VictorZheng1010 closed 2 years ago
Hi @VictorZheng1010,
Please check to see if any of the files have formatting issues that is causing them not to be processed (or the rest to be processed). For now, please generate matrices for subsets of samples and then combine the columns at the end to create an 8000+ column matrix.
Thanks!
Hello,
I've downloaded 8000+ TCGA samples to analyze the mutational signatures. I converted the maf files to ICGC simple somatic mutation format as one file:
Project Sample ID Genome mut_type chrom pos_start pos_end ref alt Type TCGA TCGA-G3-AAV3-01 . GRCh37 INS 10 32740800 32740801 - A SOMATIC TCGA TCGA-G3-AAV3-01 . GRCh37 SNP 10 43292088 43292088 G A SOMATIC TCGA TCGA-G3-AAV3-01 . GRCh37 SNP 10 48370493 48370493 G A SOMATIC TCGA TCGA-G3-AAV3-01 . GRCh37 SNP 10 6504265 6504265 C A SOMATIC
When I run SigProfilerMatrixGenerator, it told me that "The given input files do not appear to be in the correct simple text format. Skipping this file: ......". After running, SigProfilerMatrixGenerator only generated SBS/DBS/ID matrices for about 3400+ samples. Other 4000+ samples were omitted. Then I just used the rest 4000+ samples as input, it also generated the matrices for about 3400+ samples.
I don't know what's the cause of this problem. Hope you can look into this issue.
BR, WSZ