PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
446 stars 219 forks source link

How to Use maftools for Other Cancer Types? #106

Closed dschslava closed 6 years ago

dschslava commented 6 years ago

In https://www.bioconductor.org/packages/3.7/bioc/vignettes/maftools/inst/doc/maftools.html, caner type LAML was used as an example. How can I use mat fools for other cancer types such as Breast Cancer, Esophageal Squamous Carcinoma, Pancreatic Cancer, and Glioblastoma? How do I get clinical data associated with a MAF?

It seems to me that maftools does not work for all MAF files. When I try to run mafools on the following MAF: ucsc.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic, I got the following error:

esca.maf = system.file('extdata', 'ucsc.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic', package = 'maftools') esca = read.maf(maf = esca.maf) reading maf.. Error in data.table::fread(input = maf, sep = "\t", stringsAsFactors = FALSE, : Input is either empty or fully whitespace after the skip or autostart. Run again with verbose=TRUE.

dschslava commented 6 years ago

Dear PoisonAlien,

I am high school student in San Diego doing medical research at UCSD. I want to run maftools for the following cancer types but I am stuck: BRCA, ESCA, GBM, and PAAD. Please let me know how I can get data for getting clinical data and Gistic data.

Thanks in advance!

dschslava commented 6 years ago

Dear PoisonAlien,

When I change the command to

esca.maf = system.file('extdata', 'ucsc.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic.maf', package = 'maftools') esca = read.maf(maf = esca.maf)

I got the following output reading maf.. NOTE: Removed 939 duplicated variants silent variants: 6826 ID N 1: Samples 184 2: Silent 6826 Summarizing.. ID summary Mean Median 1: NCBI_Build GRCh37 NA NA 2: Center ucsc.edu NA NA 3: Samples 184 NA NA 4: nGenes 9817 NA NA 5: Missense_Mutation 19356 105.196 84.0 6: Nonsense_Mutation 1158 6.293 5.0 7: Nonstop_Mutation 38 0.207 0.0 8: Splice_Site 403 2.190 2.0 9: total 20955 113.886 90.5 Gene Summary.. Hugo_Symbol Missense_Mutation Nonsense_Mutation Nonstop_Mutation Splice_Site total MutatedSamples AlteredSamples 1: TP53 100 25 0 13 138 126 126 2: TTN 102 7 0 0 109 70 70 3: MUC16 46 1 0 0 47 36 36 4: SYNE1 34 0 0 1 35 29 29 5: CSMD3 29 3 0 0 32 28 28


9813: ZXDC 1 0 0 0 1 1 1 9814: ZYG11B 1 0 0 0 1 1 1 9815: ZYX 1 0 0 0 1 1 1 9816: ZZZ3 1 0 0 0 1 1 1 9817: hsa-mir-4763 1 0 0 0 1 1 1 NOTE: Possible FLAGS among top ten genes: [1] "TTN" "MUC16" "SYNE1" "HMCN1" "FLG"
Checking clinical data.. NOTE: Missing clinical data! It is strongly recommended to provide clinical data associated with samples if available. Done !

How do I get clinical data associated with a sample?

PoisonAlien commented 6 years ago

Hello,

Its great that you're starting this early. However I would suggest you take some introductory classes on R programming language. Regardless here is the thing,

  1. you can use any maf files as long as it contains the required columns.

  2. Maftools doesn't provide clinical data for all maf, the one included is only for demonstration purpose. You can get clinical data for your cohort (assuming you're dealing with TCGA) from gdc.

  3. Alternatively, there is an R data package containing all pre-compiled TCGA cohorts as MAF objects along with the clinical data that you can install from GitHub.

devtools::install_github(repo = "PoisonAlien/TCGAmutations")

Hope this helps.

dschslava commented 6 years ago

Dear PoisonAlien,

Thank you for your prompt reply!

I got the MAF files from Broad MIT-Harvard site and it seems that none of them have amino acids information. The MAF files are about 10 times as big as the sample MAF file you have in the example.

I'll try out the GDC site and also the pre-compilied TGCA corsets tomorrow.

Really appreciate your clear answers.

PoisonAlien commented 6 years ago

No problem, you can also paste column names from the MAF files you have, maybe I can help you with the information.

dschslava commented 6 years ago

After I issue the command devtools::install_github(repo = "PoisonAlien/TCGAmutations"), is it possible I can get precompiled TCGA cohorts for other cancer types such as BRCA and ESCA?

What do you mean by paste column names from the MAF files to R? Could you give me an example.

I'm a high school student doing research at UCSD Medical Center. Both my professors and I are new to the analysis and visualization of MAF files. We tried several tools but found out your maftools is the best.

PoisonAlien commented 6 years ago

Once you install TCGAmutations package, do this, and it should give you an use case of how to load a toga dataset. Once its loaded, just maftools on the loaded object.

library("TCGAmutations")
vignette(package = "TCGAmutations", topic = "Introduction")
#For BRCA
> tcga_load(study = "BRCA")
Loading objects:
  tcga_brca
Successfully loaded TCGA BRCA!
See MAF object tcga_brca
dschslava commented 6 years ago

Awesome!

Thanks so much!!