PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
450 stars 222 forks source link

MATH scores calculation #605

Closed DrZhaoJie closed 4 years ago

DrZhaoJie commented 4 years ago

Dear Prof Thanks for your contribution to maftools R package. When I calculated the MATH scores of all the melanoma samples from TCGA database, it kept processing up to 7 days without output results. My R code was as follows. Could you please help me solve the problem.

R code: SKCMDNA <- GDCquery_Maf("SKCM", pipelines = "muse") SKCMmaf <- SKCMDNA %>% read.maf UVMDNA <- GDCquery_Maf("UVM", pipelines = "muse") UVMmaf <- UVMDNA %>% read.maf melanomamaf = maftools:::merge_mafs(maf = c(SKCMmaf, UVMmaf), verbose = TRUE) melanomamaf@data$t_vaf <- melanomamaf@data$t_alt_count/melanomamaf@data$t_depth melanomaresMATH = inferHeterogeneity(maf = melanomamaf, tsb = melanomamaf@data$Tumor_Sample_Barcode, vafCol = 't_vaf')

It ended with the follwing words: Registered S3 methods overwritten by 'lme4': method from cooks.distance.influence.merMod car influence.merMod car dfbeta.influence.merMod car dfbetas.influence.merMod car Registered S3 method overwritten by 'jmvcore': method from
as.data.frame.Array DelayedArray

Thank you very much!

PoisonAlien commented 4 years ago

7 days!!! It should take less than 7 mins. There is definitely something fishy going on. Let me have a look and I will get back to you tomorrow..

PoisonAlien commented 4 years ago

Hi, If your aim is to get the MATH score you could directly use math.score function which is much simpler.

> math.score(maf = melanomamaf)
t_vaf field is missing, but found t_ref_count & t_alt_count columns. Estimating vaf..
Not enough mutations in TCGA-D3-A51H-06A-12D-A25O-08 . Skipping..
Not enough mutations in TCGA-D3-A8GE-06A-11D-A372-08 . Skipping..
Not enough mutations in TCGA-EE-A2ME-06A-11D-A197-08 . Skipping..
Not enough mutations in TCGA-V3-A9ZX-01A-11D-A39W-08 . Skipping..
Not enough mutations in TCGA-V4-A9ED-01A-11D-A39W-08 . Skipping..
Not enough mutations in TCGA-V4-A9F2-01A-11D-A39W-08 . Skipping..
Not enough mutations in TCGA-VD-A8K7-01B-11D-A39W-08 . Skipping..
Not enough mutations in TCGA-VD-A8KF-01A-11D-A39W-08 . Skipping..
Not enough mutations in TCGA-YZ-A980-01A-11D-A39W-08 . Skipping..
             Tumor_Sample_Barcode MedianAbsoluteDeviation     MATH
  1: TCGA-3N-A9WB-06A-11D-A38G-08                6.618352 23.35870
  2: TCGA-3N-A9WC-06A-11D-A38G-08                5.833333 31.44909
  3: TCGA-3N-A9WD-06A-11D-A38G-08                5.306122 35.04327
  4: TCGA-BF-A1PU-01A-11D-A19A-08                6.140835 20.30131
  5: TCGA-BF-A1PV-01A-11D-A19A-08                6.658596 23.29800
 ---                                                              
532: TCGA-YZ-A984-01A-11D-A39W-08                7.446809 22.08128
533: TCGA-YZ-A985-01A-11D-A39W-08               10.599361 41.20721
534: TCGA-Z2-A8RT-06A-11D-A372-08                4.861551 18.67459
535: TCGA-Z2-AA3S-06A-11D-A397-08                7.223114 25.03452
536: TCGA-Z2-AA3V-06A-11D-A397-08                4.607046 30.73683

inferHeterogeneity is to identify subclones in each tumor. In your command you are passing the wrong values for tsb argument. Below command should work fine and should finish within ten mins.

melanomaresMATH = inferHeterogeneity(maf = melanomamaf, tsb = melanomamaf@variants.per.sample$Tumor_Sample_Barcode)
DrZhaoJie commented 4 years ago

Wow! Thank you very much! I tried your method and code, and it processed less than 10 minutes! Besides, I found that the math.score calculated from "math.score(maf = melanomamaf)" was similar to that calculated from "melanomaresMATH = inferHeterogeneity(maf = melanomamaf……“. Both the results were ok and may be published in the paper, right?

DrZhaoJie commented 4 years ago

Dear Prof Thanks for your comment! I've figured out and calculated the MATH sore ignoring variants in copy number altered regions. The results were as follows. I want to know how the "CN-altered" in the "cluster" column was defined according to "CN" column? beyond[-0.3, +0.3]? image image Thank you very much! Wating for your comment sincerely!

PoisonAlien commented 4 years ago

Hello @DrZhaoJie ,

  1. Small discrepancy is probably due to vaf cut off used in the function inferHeterogeneity. See ?inferHeterogeneity for more details.

  2. Variants located on segments with CN 1.5 < CN < 2.5 are retained. Rest would be classified as CN altered.

You might also be interested in (sciclone)[https://github.com/genome/sciclone] package which offers advanced options for clonality.

DrZhaoJie commented 3 years ago

Hi, The MATH calculation in your documentation was based on maf file including nonSynonymous mutations, right? First, were the nonSynonymous mutations you defined based on which literature? Second, was it reasonable to calculate MATH based on maf file including all SNV mutation types like nonSynonymous and Synonymous mutations? Thank you very much!

PoisonAlien commented 3 years ago

Hi,

Please read the documentation for ?read.maf. By default following, variant classifications with High/Moderate variant consequences are considered as non-synonymous.

"Frame_Shift_Del", "Frame_Shift_Ins", "Splice_Site", "Translation_Start_Site","Nonsense_Mutation", "Nonstop_Mutation", "In_Frame_Del","In_Frame_Ins", "Missense_Mutation"

Only non-syn variants are considered for MATH score estimation.

DrZhaoJie commented 3 years ago

OK, I got it! You really made it very clear! Thank you for your patience!