Closed anishchakka closed 7 years ago
Hi, I see that recent version of annovar has been changed in the way it handles extra columns from the input file. This seems to be the problem. I will fix it, meanwhile if its not much of asking, can you share your annovar file (or a sample of it)? You can change essential information like barcodes if its of concern.
Hi,
I am recently also want to convert annovar annotated mutect outputs to MAF format.
unfortunately, it was used knowngene as annotation, and I checked your function annovarToMaf
, it only accepts refGene and ensGene.
the essential columns are as I see from the code:
essential.col = c("Chr", "Start", "End", "Ref", "Alt", "Func.refGene",
"Gene.refGene", "GeneDetail.refGene", "ExonicFunc.refGene",
"AAChange.refGene")
Maybe you can write a more general function to convert a dataframe to MAF format. as opened here https://github.com/PoisonAlien/maftools/issues/21
I am not quite familiar with MAF format, but please let me know if I can help.
Thanks, Tommy
Hi Tommy,
Sorry for late reply. Do you still have this issue ? Only reason I didn't include known gene because of weird UCSC annotations. I will include this in next update. Maybe you can just run annovarToMaf
in default, it should convert, but you won't able to convert ucsc ids to hgnc symbols.
No problem at all, guess you are on holidays :)
I actually managed to recreate a vcf file from the mutect output and then convert the vcf to maf using vcf2maf
.
Thanks!
Hi ,
I converted my Mutect Output VCF file to MAF using VCF2MAF.pl tools and now while running lollipopPlot(maf = laml, gene = 'TP53', AACol = 'Protein_Change', labelPos = 'all') Error in eval(expr, envir, enclos) : object 'AAChange' not found
laml.sig = oncodrive(maf = laml, AACol = 'Protein_Change', minMut = 5, pvalMethod = 'zscore') Estimating background scores from synonymous variants.. Error in eval(expr, envir, enclos) : object 'AAChange' not found
Any idea whats wrong here ? Kindly suggest !!
Hi,
Does Protein_Change column contains amino acid change info? If you have used vcf2maf, I think protein conversions will be stored under column HGVSp_Short
. Can you check this ? Also post your sessionInfo()
@PoisonAlien Hi, I had the same issue here.
> cat /path/to/my.maf | awk -F'\t' '{print $37}' | sort -u -r
...
p.*47=
p.*440Yext*55
p.*40=
p.*376=
p.*341=
p.*2Kext*?
p.*281=
p.*279Gext*?
p.*176=
p.*175Kext*?
p.*143Eext*?
p.*137=
p.*119Cext*56
p.*118=
p.*1047=
HGVSp_Short
I have the protein conversions at column HGVSp_Short
, and my sessionInfo()
is as follows:
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)
Matrix products: default
BLAS: /path/to/lib/R/lib/libRblas.so
LAPACK: /path/to/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.11 iterators_1.0.8 foreach_1.4.3 NMF_0.20.6 cluster_2.0.6 rngtools_1.2.4
[7] pkgmaker_0.22 registry_0.5 maftools_1.4.05 Biobase_2.38.0 BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] RMySQL_0.10.11 bit64_0.9-7 splines_3.4.1 assertthat_0.2.0
[5] stats4_3.4.1 blob_1.1.0 BSgenome_1.46.0 GenomeInfoDbData_0.99.1
[9] Rsamtools_1.30.0 slam_0.1-40 progress_1.1.2 ggrepel_0.7.0
[13] RSQLite_2.0 lattice_0.20-35 digest_0.6.12 GenomicRanges_1.30.0
[17] RColorBrewer_1.1-2 XVector_0.18.0 colorspace_1.3-2 cowplot_0.9.1
[21] Matrix_1.2-10 plyr_1.8.4 XML_3.98-1.9 GetoptLong_0.1.6
[25] biomaRt_2.34.0 zlibbioc_1.24.0 xtable_1.8-2 scales_0.5.0
[29] BiocParallel_1.12.0 tibble_1.3.4 IRanges_2.12.0 ggplot2_2.2.1
[33] SummarizedExperiment_1.8.0 GenomicFeatures_1.30.0 lazyeval_0.2.1 mclust_5.4
[37] survival_2.40-1 magrittr_1.5 memoise_1.1.0 changepoint_2.2.2
[41] tools_3.4.1 data.table_1.10.4-3 prettyunits_1.0.2 GlobalOptions_0.0.12
[45] matrixStats_0.52.2 gridBase_0.4-7 ComplexHeatmap_1.17.1 stringr_1.2.0
[49] S4Vectors_0.16.0 munsell_0.4.3 DelayedArray_0.4.1 AnnotationDbi_1.40.0
[53] Biostrings_2.46.0 compiler_3.4.1 GenomeInfoDb_1.14.0 rlang_0.1.4
[57] grid_3.4.1 RCurl_1.95-4.8 VariantAnnotation_1.24.2 rjson_0.2.15
[61] circlize_0.4.2 bitops_1.0-6 gtable_0.2.0 codetools_0.2-15
[65] DBI_0.7 reshape2_1.4.3 R6_2.2.2 gridExtra_2.3
[69] zoo_1.8-0 GenomicAlignments_1.14.1 rtracklayer_1.38.2 bit_1.1-12
[73] shape_1.4.3 stringi_1.1.6 Rcpp_0.12.14 wordcloud_2.5
To clarify that I do have information at column HGVSp_Short
for the gene I want to plot lollipop for:
> grep '^MUC6' /path/to/my.maf | awk -F'\t' '{print $37}' | head
p.M2064I
p.H2039=
p.S2035=
p.M2032V
p.H2029=
p.H2029R
p.T2028P
p.T2023=
p.T2023K
p.H2008Y
Thanks!
Hi, Can you post your exact command and the error that you're getting ?
I am getting the same error. I created a dataframe from my MAF file with the following columns:
> colnames(var.annovar.maf2)
[1] "Hugo_Symbol" "Chromosome" "Start_Position"
[4] "End_Position" "Reference_Allele" "Tumor_Seq_Allele2"
[7] "Variant_Classification" "Variant_Type" "Tumor_Sample_Barcode"
#Convert the data frame as a maf object
> laml = read.maf(maf = var.annovar.maf2,verbose = T)
NOTE: Removed 26 duplicated variants
NOTE: Found 894 variants with no Gene Symbols.
Annotating them as 'UnknownGene' for convenience
NOTE: Non MAF specific values in Variant_Classification column:
[1] NA
silent variants: 20129
Summarizing..
Gene Summary..
NOTE: Possible FLAGS among top ten genes:
[1] "AHNAK2" "FLG"
Checking clinical data..
NOTE: Missing clinical data! It is strongly recommended to provide clinical data associated with samples if available.
Error in Tumor_Sample_Barcode %in% maf.tsbs :
object 'Tumor_Sample_Barcode' not found
Not sure why I'm getting this error. Pls help
Hi,
What version are you using ? Can you post your sessionInfo ?
I believe I installed the version in Bioconductor. Should I try the version from github using devtools ?
Session info:
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenVisR_1.11.0 devtools_1.13.5 bindrcpp_0.2 tidyr_0.8.0
[5] data.table_1.10.4-3 maftools_1.4.25 Biobase_2.38.0 BiocGenerics_0.24.0
[9] BiocInstaller_1.28.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 matrixStats_0.53.1 bit64_0.9-7
[4] doParallel_1.0.11 RColorBrewer_1.1-2 progress_1.1.2
[7] httr_1.3.1 GenomeInfoDb_1.14.0 tools_3.4.3
[10] R6_2.2.2 DBI_0.8 lazyeval_0.2.1
[13] colorspace_1.3-2 GetoptLong_0.1.6 withr_2.1.1
[16] tidyselect_0.2.4 gridExtra_2.3 prettyunits_1.0.2
[19] RMySQL_0.10.14 curl_3.1 git2r_0.21.0
[22] bit_1.1-12 compiler_3.4.3 DelayedArray_0.4.1
[25] pkgmaker_0.22 labeling_0.3 rtracklayer_1.38.3
[28] slam_0.1-42 scales_0.5.0 NMF_0.21.0
[31] stringr_1.3.0 digest_0.6.15 Rsamtools_1.30.0
[34] XVector_0.18.0 pkgconfig_2.0.1 changepoint_2.2.2
[37] BSgenome_1.46.0 rlang_0.2.0 GlobalOptions_0.0.12
[40] RSQLite_2.0 bindr_0.1.1 shape_1.4.4
[43] zoo_1.8-1 mclust_5.4 BiocParallel_1.12.0
[46] gtools_3.5.0 dplyr_0.7.4 VariantAnnotation_1.24.5
[49] RCurl_1.95-4.10 magrittr_1.5 GenomeInfoDbData_1.0.0
[52] wordcloud_2.5 Matrix_1.2-12 Rcpp_0.12.16
[55] munsell_0.4.3 S4Vectors_0.16.0 viridis_0.5.0
[58] stringi_1.1.7 yaml_2.1.18 SummarizedExperiment_1.8.1
[61] zlibbioc_1.24.0 plyr_1.8.4 FField_0.1.0
[64] grid_3.4.3 blob_1.1.0 ggrepel_0.7.0
[67] lattice_0.20-35 Biostrings_2.46.0 cowplot_0.9.2
[70] splines_3.4.3 GenomicFeatures_1.30.3 circlize_0.4.3
[73] knitr_1.20 ComplexHeatmap_1.17.1 pillar_1.2.1
[76] GenomicRanges_1.30.3 rjson_0.2.15 rngtools_1.2.4
[79] reshape2_1.4.3 codetools_0.2-15 biomaRt_2.34.2
[82] stats4_3.4.3 glue_1.2.0 XML_3.98-1.10
[85] foreach_1.4.4 purrr_0.2.4 gtable_0.2.0
[88] assertthat_0.2.0 ggplot2_2.2.1 gridBase_0.4-7
[91] xtable_1.8-2 viridisLite_0.3.0 survival_2.41-3
[94] tibble_1.4.2 iterators_1.0.9 GenomicAlignments_1.14.1
[97] AnnotationDbi_1.40.0 registry_0.5 memoise_1.1.0
[100] IRanges_2.12.0 cluster_2.0.6
I just installed maftools from github and tried again. Still getting the error:
> var.annovar.maf = maftools::annovarToMaf(annovar = "ann.file2.txt", Center = 'GU',
+ refBuild = 'hg19',
+ tsbCol = 'Tumor_Sample_Barcode',
+ table = 'ensGene')
Read 389386 rows and 61 (of 61) columns from 0.571 GB file in 00:00:04
Error in `[.data.table`(ann, , ann.mand, with = FALSE) :
column(s) not found: Tumor_Sample_Barcode
>
Hi, Sorry for the error. Can you try installing from GitHub again and see if it works ? I have pushed a fix to address missing barcode column, which is now automatically taken care of by generating one. Also you can pass multiple files generated from Annovar. Please let me know if it works for you.
Hopefully, this can help somebody looking for a resolve.
Check exactly which column of your maf file is the AA change information. Make sure the column name is exactly the same as you pass to argument AACol
of lollipopPlot
function.
In my case above (https://github.com/PoisonAlien/maftools/issues/16#issuecomment-358386813), my column is HGVSp_Short
, but I passed AAChange
to it.
Hint: if your column is one of c("HGVSp_Short", "Protein_Change", "AAChange")
, you can keep the argument as default.
@PoisonAlien @pwwang hello, Alien and pwwang, I have the same question. lollipopPlot(maf = laml, gene = 'DNMT3A', AACol = 'HGVSp_Short', showMutationRate = TRUE) Error in eval(jsub, SDenv, parent.frame()) : object 'AAChange' not found
View(laml) colnames(laml@maf.silent) [1] "Hugo_Symbol" "Entrez_Gene_Id" "Center"
[4] "NCBI_Build" "Chromosome" "Start_Position"
[7] "End_Position" "Strand" "Variant_Classification" [10] "Variant_Type" "Reference_Allele" "Tumor_Seq_Allele1"
[13] "Tumor_Seq_Allele2" "Tumor_Sample_Barcode" "Protein_Change"
[16] "i_TumorVAF_WU" "i_transcript_name"but did not work. sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.6
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
attached base packages: [1] parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] maftools_1.8.10 bigmemory_4.5.33 Biobase_2.42.0 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] splines_3.5.3 foreach_1.4.4 assertthat_0.2.1
[4] stats4_3.5.3 BSgenome_1.50.0 GenomeInfoDbData_1.2.0
[7] Rsamtools_1.34.1 yaml_2.2.0 ggrepel_0.8.1
[10] pillar_1.4.1 lattice_0.20-38 glue_1.3.1
[13] digest_0.6.19 GenomicRanges_1.34.0 RColorBrewer_1.1-2
[16] XVector_0.22.0 colorspace_1.4-1 cowplot_0.9.4
[19] Matrix_1.2-17 plyr_1.8.4 XML_3.98-1.20
[22] pkgconfig_2.0.2 bibtex_0.4.2 GetoptLong_0.1.7
[25] zlibbioc_1.28.0 purrr_0.3.2 xtable_1.8-4
[28] scales_1.0.0 BiocParallel_1.16.6 tibble_2.1.3
[31] pkgmaker_0.27 IRanges_2.16.0 ggplot2_3.1.1
[34] withr_2.1.2 SummarizedExperiment_1.12.0 lazyeval_0.2.2
[37] mclust_5.4.3 survival_2.44-1.1 magrittr_1.5
[40] crayon_1.3.4 doParallel_1.0.14 NMF_0.21.0
[43] tools_3.5.3 registry_0.5-1 data.table_1.12.2
[46] GlobalOptions_0.1.0 matrixStats_0.54.0 gridBase_0.4-7
[49] ComplexHeatmap_1.20.0 stringr_1.4.0 S4Vectors_0.20.1
[52] munsell_0.5.0 cluster_2.0.9 rngtools_1.3.1
[55] DelayedArray_0.8.0 Biostrings_2.50.2 compiler_3.5.3
[58] GenomeInfoDb_1.18.2 rlang_0.3.4 grid_3.5.3
[61] RCurl_1.95-4.12 iterators_1.0.10 rstudioapi_0.10
[64] rjson_0.2.20 bigmemory.sri_0.1.3 circlize_0.4.6
[67] bitops_1.0-6 gtable_0.3.0 codetools_0.2-16
[70] reshape2_1.4.3 R6_2.4.0 GenomicAlignments_1.18.1
[73] gridExtra_2.3 dplyr_0.8.1 rtracklayer_1.42.2
[76] shape_1.4.4 stringi_1.4.3 Rcpp_1.0.1
[79] wordcloud_2.6 tidyselect_0.2.5
You do not have to use AACol
argument here. Check getFields(laml)
to see what columns are present in your MAF object. Pass the one which contains amino acid change information if necessary. In your case it would be Protein_Change
.
Most of the times, you do not have to use this argument as the function automatically predicts it.
You do not have to use
AACol
argument here. CheckgetFields(laml)
to see what columns are present in your MAF object. Pass the one which contains amino acid change information if necessary. In your case it would beProtein_Change
. Most of the times, you do not have to use this argument as the function automatically predicts it.
it work, thanks.
hello @PoisonAlien, I have another question. mafSurvival(maf = laml, genes = 'EGFR', time = 'OS.time', Status = 'OS', isTCGA = TRUE)
but I cannot get the individual data, and calculate the HR and 95%ci. I CHECK THE CODE, function (maf, genes = NULL, samples = NULL, clinicalData = NULL, time = "Time", Status = "Status", groupNames = c("Mutant", "WT"), showConfInt = TRUE, addInfo = TRUE, col = c("maroon", "royalblue"), isTCGA = FALSE, textSize = 12, fn = NULL, width = 6, height = 6) { if (is.null(genes) & is.null(samples)) { stop("Either provide Gene names or Sample names to group by.") } if (!is.null(genes) & !is.null(samples)) { stop("Either provide Gene names or Sample names to group by. Not both!") } if (is.null(clinicalData)) { message("Looking for clinical data in annoatation slot of MAF..") clinicalData = getClinicalData(x = maf) clinicalData = data.table::setDT(clinicalData) } else { clinicalData = data.table::setDT(clinicalData) } if (!"Tumor_Sample_Barcode" %in% colnames(clinicalData)) { print(colnames(clinicalData)) stop("Column Tumo_Sample_Barcode not found in clinical data. Check column names and rename it to Tumo_Sample_Barcode if necessary.") } if (isTCGA) { clinicalData$Tumor_Sample_Barcode = substr(x = clinicalData$Tumor_Sample_Barcode, start = 1, stop = 12) } if (length(colnames(clinicalData)[colnames(clinicalData) %in% time]) == 0) { print(colnames(clinicalData)) stop(paste0(time, " not found in clinicalData. Use argument time to povide column name containing time to event.")) } else { colnames(clinicalData)[colnames(clinicalData) %in% time] = "Time" } if (length(colnames(clinicalData)[colnames(clinicalData) %in% Status]) == 0) { print(colnames(clinicalData)) stop(paste0(Status, " not found in clinicalData. Use argument Status to povide column name containing events (Dead or Alive).")) } else { colnames(clinicalData)[colnames(clinicalData) %in% Status] = "Status" } if (!is.null(genes)) { genesTSB = genesToBarcodes(maf = maf, genes = genes, justNames = TRUE) genesTSB = genesTSB[sapply(genesTSB, FUN = function(x) length(x) != 0)] message("Number of mutated samples for given genes: ") print(sapply(genesTSB, FUN = length)) genesMissing = genes[!genes %in% names(genesTSB)] if (length(genesMissing) > 0) { genes = genes[!genes %in% genesMissing] genesMissing = paste(genesMissing, collapse = ", ") message(paste0("genes ", genesMissing, " does not seeem to be mutated. Removing them.")) } if (length(genes) == 0) { stop("None of the given genes are mutated!") } else { genes = paste(genes, collapse = ", ") } genesTSB = unique(as.character(unlist(genesTSB))) } else { genesTSB = samples } data.table::setDT(clinicalData) clinicalData$Time = suppressWarnings(as.numeric(as.character(clinicalData$Time))) clinicalData$Status = suppressWarnings(as.integer(as.character(clinicalData$Status))) clinicalData$Group = ifelse(test = clinicalData$Tumor_Sample_Barcode %in% genesTSB, yes = groupNames[1], no = groupNames[2]) clin.mut.dat = clinicalData[, .(medianTime = median(Time, na.rm = TRUE), N = .N), Group][order(Group)] message("Median survival..") print(clin.mut.dat) clinicalData$Time = ifelse(test = is.infinite(clinicalData$Time), yes = 0, no = clinicalData$Time) surv.km = survival::survfit(formula = survival::Surv(time = Time, event = Status) ~ Group, data = clinicalData, conf.type = "log-log") res = summary(surv.km) surv.diff = survival::survdiff(formula = survival::Surv(time = Time, event = Status) ~ Group, data = clinicalData) surv.diff.pval = round(1 - pchisq(surv.diff$chisq, length(surv.diff$n) - 1), digits = 5) surv.dat = data.frame(Group = res$strata, Time = res$time, survProb = res$surv, survUp = res$upper, survLower = res$lower) surv.dat$Group = gsub(pattern = "Group=", replacement = "", x = surv.dat$Group) surv.gg = ggplot(data = surv.dat, aes(x = Time, y = survProb, group = Group)) + geom_line(aes(color = Group)) + geom_point(aes(color = Group)) + cowplot::theme_cowplot(font_size = 12, line_size = 1) + theme(legend.position = "bottom", plot.title = element_text(size = 14, face = "bold"), axis.text.x = element_text(face = "bold"), axis.text.y = element_text(face = "bold"), axis.title.x = element_text(face = "bold"), axis.title.y = element_text(face = "bold"), plot.subtitle = element_text(size = 12, face = "bold", colour = ifelse(surv.diff.pval < 0.05, yes = "red", no = "black")), legend.title = element_blank()) + ggtitle(label = paste0(groupNames[1], " v/s ", groupNames[2]), subtitle = paste0("P-value: ", surv.diff.pval)) + xlab("Time") + ylab("Survival") + scale_color_manual(values = col) if (addInfo) { ypos = as.numeric(as.character(quantile(surv.dat$Time, probs = seq(0, 1, 0.1))["90%"])) surv.gg = surv.gg + annotation_custom(grob = gridExtra::tableGrob(d = clin.mut.dat, rows = NULL, cols = c("Group", "Median", "#Cases"), theme = gridExtra::ttheme_minimal(base_size = textSize)), ymin = 0.75, ymax = 0.95, xmin = ypos, xmax = max(surv.dat$Time, na.rm = TRUE)) } ypos = as.numeric(quantile(surv.dat$Time, probs = seq(0, 1, 0.1))["70%"]) if (showConfInt) { surv.gg = surv.gg + geom_ribbon(aes(ymin = survLower, ymax = survUp), alpha = 0.2, fill = "grey70") } if (!is.null(fn)) { cowplot::save_plot(filename = paste0(fn, ".pdf"), plot = surv.gg, base_height = height, base_width = width, bg = "white") } surv.gg } <bytecode: 0x16568c678>
@jianguozhouzunyimedicaluniversity You should open a new issue, so the author can track it.
By the way, please wrap code in code chunk. You can reference the function using link https://github.com/PoisonAlien/maftools/blob/3d5f7875a987836cb9855763d3270c387856ece4/R/mafSuvival.R#L30 instead of pasting here.
Hi,
I have annovar format files and would like to convert them to maf format. I'm having trouble converting them. It says Tumor_Sample_Barcode not found, though I have added it in my file.
Here is the code, I have used and the error I get.
Can you give me suggestions on how to proceed?
Thanks, Anish.