Closed dschslava closed 6 years ago
Dear PoisonAlien,
I am high school student in San Diego doing medical research at UCSD. I want to run maftools for the following cancer types but I am stuck: BRCA, ESCA, GBM, and PAAD. Please let me know how I can get data for getting clinical data and Gistic data.
Thanks in advance!
Dear PoisonAlien,
When I change the command to
esca.maf = system.file('extdata', 'ucsc.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic.maf', package = 'maftools') esca = read.maf(maf = esca.maf)
I got the following output reading maf.. NOTE: Removed 939 duplicated variants silent variants: 6826 ID N 1: Samples 184 2: Silent 6826 Summarizing.. ID summary Mean Median 1: NCBI_Build GRCh37 NA NA 2: Center ucsc.edu NA NA 3: Samples 184 NA NA 4: nGenes 9817 NA NA 5: Missense_Mutation 19356 105.196 84.0 6: Nonsense_Mutation 1158 6.293 5.0 7: Nonstop_Mutation 38 0.207 0.0 8: Splice_Site 403 2.190 2.0 9: total 20955 113.886 90.5 Gene Summary.. Hugo_Symbol Missense_Mutation Nonsense_Mutation Nonstop_Mutation Splice_Site total MutatedSamples AlteredSamples 1: TP53 100 25 0 13 138 126 126 2: TTN 102 7 0 0 109 70 70 3: MUC16 46 1 0 0 47 36 36 4: SYNE1 34 0 0 1 35 29 29 5: CSMD3 29 3 0 0 32 28 28
9813: ZXDC 1 0 0 0 1 1 1
9814: ZYG11B 1 0 0 0 1 1 1
9815: ZYX 1 0 0 0 1 1 1
9816: ZZZ3 1 0 0 0 1 1 1
9817: hsa-mir-4763 1 0 0 0 1 1 1
NOTE: Possible FLAGS among top ten genes:
[1] "TTN" "MUC16" "SYNE1" "HMCN1" "FLG"
Checking clinical data..
NOTE: Missing clinical data! It is strongly recommended to provide clinical data associated with samples if available.
Done !
How do I get clinical data associated with a sample?
Hello,
Its great that you're starting this early. However I would suggest you take some introductory classes on R programming language. Regardless here is the thing,
you can use any maf files as long as it contains the required columns.
Maftools doesn't provide clinical data for all maf, the one included is only for demonstration purpose. You can get clinical data for your cohort (assuming you're dealing with TCGA) from gdc.
Alternatively, there is an R data package containing all pre-compiled TCGA cohorts as MAF objects along with the clinical data that you can install from GitHub.
devtools::install_github(repo = "PoisonAlien/TCGAmutations")
Hope this helps.
Dear PoisonAlien,
Thank you for your prompt reply!
I got the MAF files from Broad MIT-Harvard site and it seems that none of them have amino acids information. The MAF files are about 10 times as big as the sample MAF file you have in the example.
I'll try out the GDC site and also the pre-compilied TGCA corsets tomorrow.
Really appreciate your clear answers.
No problem, you can also paste column names from the MAF files you have, maybe I can help you with the information.
After I issue the command devtools::install_github(repo = "PoisonAlien/TCGAmutations"), is it possible I can get precompiled TCGA cohorts for other cancer types such as BRCA and ESCA?
What do you mean by paste column names from the MAF files to R? Could you give me an example.
I'm a high school student doing research at UCSD Medical Center. Both my professors and I are new to the analysis and visualization of MAF files. We tried several tools but found out your maftools is the best.
Once you install TCGAmutations package, do this, and it should give you an use case of how to load a toga dataset. Once its loaded, just maftools on the loaded object.
library("TCGAmutations")
vignette(package = "TCGAmutations", topic = "Introduction")
#For BRCA
> tcga_load(study = "BRCA")
Loading objects:
tcga_brca
Successfully loaded TCGA BRCA!
See MAF object tcga_brca
Awesome!
Thanks so much!!
In https://www.bioconductor.org/packages/3.7/bioc/vignettes/maftools/inst/doc/maftools.html, caner type LAML was used as an example. How can I use mat fools for other cancer types such as Breast Cancer, Esophageal Squamous Carcinoma, Pancreatic Cancer, and Glioblastoma? How do I get clinical data associated with a MAF?
It seems to me that maftools does not work for all MAF files. When I try to run mafools on the following MAF: ucsc.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic, I got the following error: