BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
286 stars 109 forks source link

Error in TCGAanalyze_DEA #327

Open ghost opened 5 years ago

ghost commented 5 years ago

When I excuted TCGAanalyze_DEA(), this error occurred:

Error in getBM(attributes = attributes, filters = c("ensembl_gene_id"),  : 
  Invalid attribute(s): entrezgene 
Please use the function 'listAttributes' to get valid attribute names

I've updated TCGAbiolinks to the latest version 2.13.3, but the same error message comes out.

Could you help me this?

Thank you.

ghost commented 5 years ago

Hi, @tiagochst You let me know there was a change of the name from entrezgene to entrezgene_id from your reply in #250 issues and changed the code. So, I think my issue is resulted of the name change from followed code embedded in TCGAanalyze_DEA().

map.ensg <- function(genome = "hg38", genes) {
    if (genome == "hg19"){
        # for hg19
        ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",
                           host = "feb2014.archive.ensembl.org",
                           path = "/biomart/martservice" ,
                           dataset = "hsapiens_gene_ensembl")
        attributes <- c("ensembl_gene_id", "entrezgene","external_gene_id")
    } else {
        # for hg38
        ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
        attributes <- c("ensembl_gene_id", "entrezgene","external_gene_name")
    }
    gene.location <- getBM(attributes = attributes,
                           filters = c("ensembl_gene_id"),
                           values = list(genes), mart = ensembl)
    colnames(gene.location) <-  c("ensembl_gene_id", "entrezgene","external_gene_name")
    gene.location <- gene.location[match(genes,gene.location$ensembl_gene_id),]
    return(gene.location)
}

Therefore I think it is also necessary to change the name of entrezgene to entrezgene_id from above code.

I hope this is helpful.

Thank you for all of your efforts.

tiagochst commented 5 years ago

@diceofeugene Thanks for poiting that problem out. Antonio is the one supporting the DEA code, so I was not aware of it. I just changed the code in that function and it should be fixed.

ghost commented 5 years ago

@tiagochst Thank you very much! It work nicely now!

ghost commented 5 years ago

Hi @tiagochst , I get basically the same problem. I'm trying to do the case study n°3 from here: ftp://202.141.160.110/bioc/3.7/bioc/vignettes/TCGAbiolinks/inst/doc/casestudy.html

When I try to do the starbust plot, I get the error.


starburst <- TCGAvisualize_starburst(met = acc.met,
                                      exp = dataDEGs,
                                      genome = "hg19",
                                      group1 = "CIMP-high",
                                      group2 = "CIMP-low",
                                      filename = "starburst.png",
                                      met.platform = "450K",
                                      met.p.cut = 10^-5,
                                      exp.p.cut = 10^-5,
                                      diffmean.cut = 0.25,
                                      logFC.cut = 3,
                                      names = FALSE,
                                      height = 10,
                                      width = 15,
                                      dpi = 300)
Accessing grch37.ensembl.org to get gene information
Downloading genome information (try:0) Using: Human genes (GRCh37.p13)
Loading from disk
o Fetching auxiliary information
oo Fetching probes genomic information
http://zwdzwd.io/InfiniumAnnotation/current/hm450/hm450.hg19.manifest.rds
oo Fetching TSS information
Downloading transcripts information. Using: Human genes (GRCh37.p13)
Error in getBM(attributes = attributes, filters = c("chromosome_name"),  : 
  Invalid attribute(s): entrezgene 
Please use the function 'listAttributes' to get valid attribute names`

This is the result of sessionInfo()

R version 3.6.1 Patched (2019-07-10 r76812)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SummarizedExperiment_1.14.0 DelayedArray_0.10.0         BiocParallel_1.18.0        
 [4] matrixStats_0.54.0          Biobase_2.44.0              GenomicRanges_1.36.0       
 [7] GenomeInfoDb_1.20.0         IRanges_2.18.1              S4Vectors_0.22.0           
[10] BiocGenerics_0.30.0         TCGAbiolinks_2.13.3         biomaRt_2.41.7             

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1            ggsignif_0.5.0              selectr_0.4-1              
  [4] rjson_0.2.20                hwriter_1.3.2               circlize_0.4.6             
  [7] XVector_0.24.0              GlobalOptions_0.1.0         clue_0.3-57                
 [10] rstudioapi_0.10             ggpubr_0.2.1                matlab_1.0.2               
 [13] ggrepel_0.8.1               bit64_0.9-7                 AnnotationDbi_1.46.0       
 [16] xml2_1.2.0                  codetools_0.2-16            splines_3.6.1              
 [19] R.methodsS3_1.7.1           doParallel_1.0.14           DESeq_1.36.0               
 [22] geneplotter_1.62.0          knitr_1.23                  zeallot_0.1.0              
 [25] jsonlite_1.6                Rsamtools_2.0.0             km.ci_0.5-2                
 [28] broom_0.5.2                 annotate_1.62.0             cluster_2.1.0              
 [31] dbplyr_1.4.2                png_0.1-7                   R.oo_1.22.0                
 [34] readr_1.3.1                 compiler_3.6.1              httr_1.4.0                 
 [37] backports_1.1.4             assertthat_0.2.1            Matrix_1.2-17              
 [40] lazyeval_0.2.2              limma_3.40.2                prettyunits_1.0.2          
 [43] tools_3.6.1                 gtable_0.3.0                glue_1.3.1                 
 [46] GenomeInfoDbData_1.2.1      dplyr_0.8.3                 ggthemes_4.2.0             
 [49] rappdirs_0.3.1              ShortRead_1.42.0            Rcpp_1.0.1                 
 [52] vctrs_0.2.0                 Biostrings_2.52.0           nlme_3.1-140               
 [55] rtracklayer_1.44.0          iterators_1.0.10            xfun_0.8                   
 [58] stringr_1.4.0               rvest_0.3.4                 XML_3.98-1.20              
 [61] edgeR_3.26.5                zoo_1.8-6                   zlibbioc_1.30.0            
 [64] scales_1.0.0                aroma.light_3.14.0          hms_0.5.0                  
 [67] RColorBrewer_1.1-2          ComplexHeatmap_2.0.0        yaml_2.2.0                 
 [70] curl_4.0                    memoise_1.1.0               gridExtra_2.3              
 [73] KMsurv_0.1-5                ggplot2_3.2.0               downloader_0.4             
 [76] latticeExtra_0.6-28         stringi_1.4.3               RSQLite_2.1.1              
 [79] genefilter_1.66.0           foreach_1.4.4               GenomicFeatures_1.36.4     
 [82] shape_1.4.4                 rlang_0.4.0                 pkgconfig_2.0.2            
 [85] bitops_1.0-6                lattice_0.20-38             purrr_0.3.2                
 [88] cmprsk_2.2-8                GenomicAlignments_1.20.1    bit_1.1-14                 
 [91] tidyselect_0.2.5            plyr_1.8.4                  magrittr_1.5               
 [94] R6_2.4.0                    generics_0.0.2              DBI_1.0.0                  
 [97] mgcv_1.8-28                 pillar_1.4.2                survival_2.44-1.1          
[100] RCurl_1.95-4.12             tibble_2.1.3                EDASeq_2.18.0              
[103] crayon_1.3.4                survMisc_0.5.5              BiocFileCache_1.8.0        
[106] GetoptLong_0.1.7            progress_1.2.2              locfit_1.5-9.1             
[109] grid_3.6.1                  sva_3.32.1                  data.table_1.12.2          
[112] blob_1.2.0                  ConsensusClusterPlus_1.48.0 digest_0.6.20              
[115] xtable_1.8-4                tidyr_0.8.3                 R.utils_2.9.0              
[118] openssl_1.4.1               munsell_0.5.0               survminer_0.4.4            
[121] askpass_1.1

Could you help me or suggest what can I do? Thank you very much

MarcinRuc commented 4 years ago

Hi I found solution. The problem is in getTSS(genome = genome) code in TCGAvisualize_starburst function. I deal with it by such way. This same function is in ELMER library, so I used such code library(ELMER) getTSS(genome = "hg19")# It download TSS coordinates to my working directory, then I used code from manual and it is worked starburst <- TCGAvisualize_starburst(met = acc.met, exp = dataDEGs, genome = "hg19", group1 = "CIMP-high", group2 = "CIMP-low",filename = "starburst.png",met.platform = "450K",met.p.cut = 10^-5, exp.p.cut = 10^-5, diffmean.cut = 0.25, logFC.cut = 3, names = FALSE, height=10, width=15, dpi=300) You may also download tss file from this link to your directory:https://www.dropbox.com/s/0pfmwfaupqrjh41/Human_genes__GRCh37_p13__tss.rda?dl=0

tiagochst commented 4 years ago

@MarcinRuc @mikyzo88 I need to give a lot support to TCGAvisualize_starburst (actually change it), but unfortunately, I did not had time to do it, nor I think I will have in the short term.

I personally don't like the method which uses the results from two analysis separately. Briefly explaining, for a given probe it uses the difference of a mean methylation of two groups and for a given gene the log2FC between the same groups, but those groups might have different molecular subtypes within it and the results might be misleading.

The last methods I have been working correlates the expression and methylation within the same sample, which, at least for me, seems more correct and makes more sense. Some of those methods were used in this paper (https://www.sciencedirect.com/science/article/pii/S2405471219302017) and in the ELMER package (http://bioconductor.org/packages/ELMER/,https://doi.org/10.1093/bioinformatics/bty902).

For the moment, I don't suggest using it. I'll probably remove it from the TCGAbiolinks package and workflow, but I need to talk to the other authors before doing that.

ElizabethCattaneo commented 1 year ago

Hello. When I excuted TCGAanalyzeDEA(), in pipeline = "limma" this error occurred: Error in limma::makeContrasts(contrasts = contr, levels = design): The levels must by syntactically valid names in R, see help(make.names)._ I am using my own names in Con1type and Cond2type but this error doesn’t occur if I use pipeline = “edgeR” How can I fix it? Thanks!