PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
447 stars 219 forks source link

Mutational Signatures for mouse mutation data? #197

Closed SNRNS closed 5 years ago

SNRNS commented 5 years ago

Hi,

I have MAF files from mice tumours, and I tried to extract mutational signatures from that.

The trinucleotideMatrix function does not recognise the BSgenome.Mmusculus.UCSC.mm10 genome.

Is it possible to compare the mutational spectrum of mouse tumours to the mutational signatures derived from human tumours? After all, mouse tumour models are supposed to resemble to a certain extent human tumours. (?)

Any idea about this?

Thanks, Alejandro

PoisonAlien commented 5 years ago

Hi, Yes, you can compare. They are just mutational probabilities for each conversion type. Could you post your command and error message ?

SNRNS commented 5 years ago

Thanks for the quick reply!

Command: parental.tnm = trinucleotideMatrix(maf = parental, prefix = 'chr', add = TRUE, ref_genome = 'BSgenome.Mmusculus.UCSC.mm10')

Error: Could not find BSgenome BSgenome.Mmusculus.UCSC.mm10 Found following BSgenome installtions. Correct ref_genome argument if necessary. pkgname organism provider provider_version masked 1: BSgenome.Hsapiens.UCSC.hg19 Hsapiens UCSC hg19 FALSE Error in trinucleotideMatrix(maf = parental, prefix = "chr", add = TRUE, :

However, do have installed BSgenome.Mmusculus.UCSC.mm10

Command: BSgenome::installed.genomes()

Output: [1] "BSgenome.Hsapiens.UCSC.hg19" "BSgenome.Mmusculus.UCSC.mm10"

Thanks! A

PoisonAlien commented 5 years ago

Hi, Thanks for reporting. Script was filtering any non-human BSgenome objects. I have pushed a commit which should work for you. Could you install it from GitHub and let me know ?

library("devtools")
install_github(repo = "PoisonAlien/maftools")
SNRNS commented 5 years ago

Not sure why it is not installing:

> install_github(repo = "PoisonAlien/maftools")
Downloading GitHub repo PoisonAlien/maftools@master
tar: Failed to set default locale
tar: Failed to set default locale
Skipping 18 packages ahead of CRAN: BSgenome, Biobase, BiocGenerics, BiocParallel, Biostrings, ComplexHeatmap, DelayedArray, GenomeInfoDb, GenomeInfoDbData, GenomicAlignments, GenomicRanges, IRanges, Rsamtools, S4Vectors, SummarizedExperiment, XVector, rtracklayer, zlibbioc
   During startup - Warning messages:
   1: Setting LC_CTYPE failed, using "C" 
   2: Setting LC_TIME failed, using "C" 
   3: Setting LC_MESSAGES failed, using "C" 
   4: Setting LC_MONETARY failed, using "C" 
v  checking for file '/private/var/folders/0x/_9tqxlf13mq9dnj3mvt0kr94vsllz0/T/Rtmp02mXVB/remotes16cf3498f2a5d/PoisonAlien-maftools-52c8434/DESCRIPTION' ...
-  preparing 'maftools':
v  checking DESCRIPTION meta-information ...
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'maftools_1.7.51.tar.gz'
   Warning: invalid uid value replaced by that for user 'nobody'
   Warning: invalid gid value replaced by that for user 'nobody'

Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) : 
  (converted from warning) installation of package '/var/folders/0x/_9tqxlf13mq9dnj3mvt0kr94vsllz0/T//Rtmp02mXVB/file16cf36ae1529e/maftools_1.7.51.tar.gz' had non-zero exit status
PoisonAlien commented 5 years ago

Could you restart your R session and try again ?

SNRNS commented 5 years ago

I restarted the R session and the same error appears.

PoisonAlien commented 5 years ago

It seems to be noted issue with macOS. If you are on mac, could you try the below,

system('defaults write org.R-project.R force.LANG en_US.UTF-8')

then restart R.

Source: https://stackoverflow.com/questions/3907719/how-to-fix-tar-failed-to-set-default-locale-error http://vadimkyssa.com/2016/12/how-to-solve-tar-failed-to-set-default-locale-error-on-mac-while-installing-rjava/

SNRNS commented 5 years ago

Thanks for the link, also found it. It now installed the GitHub version and it recognises the mouse genome BUT I'm getting another error:

> parental.tnm = trinucleotideMatrix(maf = parental, prefix = 'chr', add = TRUE, 
+                                    ref_genome = 'BSgenome.Mmusculus.UCSC.mm10')
Error in query$End_Position + 20 : 
  non-numeric argument to binary operator

Even after doing:

> parental@data$End_Position <- as.numeric(as.character(parental@data$End_Position))
> 
> head(parental@data$End_Position)
[1]  4349647 12939079 18134095 20565826 22398608 22428528

I get the error:

Error in query$End_Position + 20 : 
  non-numeric argument to binary operator

:?

PoisonAlien commented 5 years ago

I have hard time figuring it out. Can you check if your data has missing values in start/end position column ? Also is it possible to share your data so that I can track the issue ? Again, sorry for these errors :| I never ran this tool on Mice data (although it shouldn't matter).

SNRNS commented 5 years ago

On the contrary, thanks so much for your help! I know how busy all we are so I appreciate enormously your willingness to help. :)

The column has no missing data apparently:

> sum(is.na(parental@data$End_Position))
[1] 0

Yes, I am happy to share the data: B2905.Parental.maf.vaf.txt.gz

PoisonAlien commented 5 years ago

I have fixed the issue. It was a simple character to numeric conversion. Somehow data.table parsed these columns as character. Also please note that you only have one sample (3 other samples have just single mutation). This is again an issue since this tool is designed to work with cohorts. So you might face some issues down the line.

For now below commands works fine for me.

x = maftools::read.maf(maf = "~/Downloads/B2905.Parental.maf.vaf.txt")
xx = maftools::trinucleotideMatrix(maf = x, ref_genome = "BSgenome.Mmusculus.UCSC.mm10",prefix = "chr", add = TRUE)

library(NMF)
xxx = maftools::extractSignatures(mat = xx, n = 2) #manually specifying n=2 since you don't have many samples.
maftools::plotSignatures(nmfRes = xxx)

You will have to re-install again for these changes to work. Please let me know if you have any followup questions.

SNRNS commented 5 years ago

Thanks so much!

It indeed works. Now, as you noticed that MAF file only has 1 sample. We want to know which mutational signatures it has... is this feasible/proper to do?

As a quick check, I duplicated the same sample 10 times in the same MAF file, and ran the analysis. It indeed finds some mutational signatures, but not the expected one, which is UVB. So, I'm wondering whether there is indeed a way to do it or there is an actual need of a cohort of samples.

Any suggestions/orientation would be appreciated.

Thanks A

PoisonAlien commented 5 years ago

When you use plotSignatures you can see the best possible match based on a correlation value. It seems to be of Signature-5 which according to cosmic is unknown and found in all cancers.

Since you have only one sample, I would suggest you to try deconstructsigs which will estimate the presence and exposure of all known signatures in a sample. I think this would be most suitable for your case.

Hope this helps. Let me know if you have any follow up questions.

fanglingcloud commented 5 years ago

I have fixed the issue. It was a simple character to numeric conversion. Somehow data.table parsed these columns as character. Also please note that you only have one sample (3 other samples have just single mutation). This is again an issue since this tool is designed to work with cohorts. So you might face some issues down the line.

For now below commands works fine for me.

x = maftools::read.maf(maf = "~/Downloads/B2905.Parental.maf.vaf.txt")
xx = maftools::trinucleotideMatrix(maf = x, ref_genome = "BSgenome.Mmusculus.UCSC.mm10",prefix = "chr", add = TRUE)

library(NMF)
xxx = maftools::extractSignatures(mat = xx, n = 2) #manually specifying n=2 since you don't have many samples.
maftools::plotSignatures(nmfRes = xxx)

You will have to re-install again for these changes to work. Please let me know if you have any followup questions.

I have met the same problem "Error in query$End_Position + 20 : non-numeric argument to binary operator", and I also met the problem using the above data. I want to know how you fix it.

ShixiangWang commented 5 years ago

@fanglingcloud Could install the devel version of maftools from GitHub and test if it works?

fanglingcloud commented 5 years ago

@ShixiangWang Thank you so much. It works.