PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
447 stars 219 forks source link

Session Aborted & Auto-killed while using "trinucleotideMatrix" #841

Closed serendipitYang closed 2 years ago

serendipitYang commented 2 years ago

trinucleotideMatrix Does not run properly and session shut down itself Hi, While running mutational signatures, I substracted a maf file preparing for computing TNM matrix thru "trinucleotideMatrix" function. However, while running it on my local PC, it shut down like:

Screen Shot 2022-06-16 at 12 39 39 PM

Firstly, I thought it is due to my insufficient storage. So I ran again on a linux server, thru Rstudio-server and "R CMD" command separately. But it still shut down like:

Screen Shot 2022-06-16 at 12 41 44 PM

and

[Ty TCGA-HNSC]$ R CMD BATCH write_tnm_p.R 
/opt/R/4.2.0/lib/R/bin/BATCH: line 60:  3932 Killed                  ${R_HOME}/bin/R -f ${in} ${opts} ${R_BATCH_OPTIONS} > ${out} 2>&1

I don't know what to do and how to fix it. FYI, the last running script is:

hnsc_positive_maf.tnm = trinucleotideMatrix(maf = hnsc_positive_maf, prefix = 'chr', add = TRUE, ref_genome = "BSgenome.Hsapiens.UCSC.hg19")

and its output below:

-Found following BSgenome installtions. Using first entry
                       pkgname organism provider genome masked
1: BSgenome.Hsapiens.UCSC.hg19 Hsapiens     UCSC   hg19  FALSE
Warning in trinucleotideMatrix(maf = hnsc_positive_maf, prefix = "chr",  :
  Chromosome names in MAF must match chromosome names in reference genome.
Ignorinig 5 single nucleotide variants from missing chromosomes chrGL000219.1, chrGL000205.1
-Extracting 5' and 3' adjacent bases

Then shut down without any signs.

serendipitYang commented 2 years ago

And here is how my MAF file looks like:

> hnsc_positive_maf
An object of class  MAF 
                        ID                summary    Mean Median
 1:             NCBI_Build                hg19;37      NA     NA
 2:                 Center bcgsc.ca;broad.mit.edu      NA     NA
 3:                Samples                     42      NA     NA
 4:                 nGenes                   3326      NA     NA
 5:        Frame_Shift_Del                    167   3.976    4.0
 6:        Frame_Shift_Ins                     80   1.905    1.5
 7:           In_Frame_Del                     82   1.952    1.5
 8:           In_Frame_Ins                      8   0.190    0.0
 9:      Missense_Mutation                   4886 116.333   85.5
10:      Nonsense_Mutation                    169   4.024    3.0
11:       Nonstop_Mutation                      4   0.095    0.0
12:            Splice_Site                    150   3.571    3.0
13: Translation_Start_Site                     27   0.643    0.0
14:                  total                   5573 132.690  102.0

And head:

Screen Shot 2022-06-16 at 12 50 56 PM
ShixiangWang commented 2 years ago

Could you provide a minimal example data to reproduce the error and the maftools version you used.

serendipitYang commented 2 years ago

Sure. Attached here. hnsc_positive_maf.zip

ShixiangWang commented 2 years ago

@serendipitYang Hi, you should check your MAF file, as I found many variants haven't position data. Remove the problematic rows works.

library(maftools)
maf = data.table::fread("~/../Downloads/hnsc_positive_maf/hnsc_positive_maf_maftools.maf")
sum(is.na(maf$Start_Position))
maf = read.maf(maf[!is.na(maf$Start_Position)])

hnsc_positive_maf.tnm = trinucleotideMatrix(maf = maf, 
                                            prefix = 'chr', 
                                            add = TRUE, 
                                            ref_genome = "BSgenome.Hsapiens.UCSC.hg19")
ShixiangWang commented 2 years ago

@serendipitYang Does this fix for you?

PoisonAlien commented 2 years ago

Hello @ShixiangWang , It was indeed the case. I have addressed the issue and it should now take care of NA. Thanks for tracking it :)

serendipitYang commented 2 years ago

Thanks @ShixiangWang ! I check my data it is indeed because the "NA" issue.