CharlesJB / ENCODExplorer

5 stars 4 forks source link

some metadata are not up to date? #40

Closed crazyhottommy closed 4 years ago

crazyhottommy commented 6 years ago

Hi,

Thanks for this useful tool:

library(ENCODExplorer)
data(encode_df, package = "ENCODExplorer")
query_results_melanocyte <- queryEncode(df=encode_df, organism = "Homo sapiens",
                      biosample_name = c("foreskin melanocyte"), file_format = "fastq", fixed = FALSE,
                      assay = "ChIP-seq")

> query_results_melanocyte
Empty data.table (0 rows) of 73 cols: accession,file_accession,file_type,file_format,file_size,output_category...

> devtools::session_info()
Session info ---------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.2 (2017-09-28)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.0.153)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Chicago             
 date     2018-01-11                  

Packages -------------------------------------------------------------------------------------------
 package       * version  date       source         
 assertthat      0.2.0    2017-04-11 cran (@0.2.0)  
 base          * 3.4.2    2017-10-04 local          
 bindr           0.1      2016-11-13 cran (@0.1)    
 bindrcpp      * 0.2      2017-06-17 cran (@0.2)    
 BiocInstaller * 1.28.0   2017-10-31 Bioconductor   
 bitops          1.0-6    2013-08-17 cran (@1.0-6)  
 compiler        3.4.2    2017-10-04 local          
 data.table      1.10.4-3 2017-10-27 cran (@1.10.4-)
 datasets      * 3.4.2    2017-10-04 local          
 devtools        1.13.3   2017-08-02 CRAN (R 3.4.1) 
 digest          0.6.12   2017-01-27 CRAN (R 3.4.0) 
 dplyr           0.7.4    2017-09-28 cran (@0.7.4)  
 DT            * 0.2      2016-08-09 CRAN (R 3.4.0) 
 ENCODExplorer * 2.4.0    2017-10-31 Bioconductor   
 glue            1.2.0    2017-10-29 cran (@1.2.0)  
 graphics      * 3.4.2    2017-10-04 local          
 grDevices     * 3.4.2    2017-10-04 local          
 htmltools       0.3.6    2017-04-28 CRAN (R 3.4.0) 
 htmlwidgets     0.9      2017-07-10 CRAN (R 3.4.1) 
 httpuv          1.3.5    2017-07-04 CRAN (R 3.4.1) 
 jsonlite        1.5      2017-06-01 CRAN (R 3.4.0) 
 magrittr        1.5      2014-11-22 cran (@1.5)    
 memoise         1.1.0    2017-04-21 CRAN (R 3.4.0) 
 methods       * 3.4.2    2017-10-04 local          
 mime            0.5      2016-07-07 CRAN (R 3.4.0) 
 parallel        3.4.2    2017-10-04 local          
 pkgconfig       2.0.1    2017-03-21 cran (@2.0.1)  
 purrr           0.2.4    2017-10-18 CRAN (R 3.4.2) 
 R6              2.2.2    2017-06-17 CRAN (R 3.4.0) 
 Rcpp            0.12.14  2017-11-23 cran (@0.12.14)
 RCurl           1.95-4.8 2016-03-01 cran (@1.95-4.)
 rlang           0.1.4    2017-11-05 cran (@0.1.4)  
 shiny         * 1.0.5    2017-08-23 CRAN (R 3.4.1) 
 shinythemes   * 1.1.1    2016-10-12 CRAN (R 3.4.0) 
 stats         * 3.4.2    2017-10-04 local          
 stringi         1.1.6    2017-11-17 cran (@1.1.6)  
 stringr         1.2.0    2017-02-18 cran (@1.2.0)  
 tibble          1.3.4    2017-08-22 cran (@1.3.4)  
 tidyr           0.7.2    2017-10-16 cran (@0.7.2)  
 tools           3.4.2    2017-10-04 local          
 utils         * 3.4.2    2017-10-04 local          
 withr           2.0.0    2017-07-28 CRAN (R 3.4.1) 
 xtable          1.8-2    2016-02-05 cran (@1.8-2) 

but I went to the ENCODE site and can find the fastqs are there https://www.encodeproject.org/search/?type=Experiment&assay_title=ChIP-seq&target.investigated_as=histone+modification&files.file_type=fastq&biosample_type=primary+cell&biosample_term_name=foreskin+melanocyte&biosample_term_name=foreskin+melanocyte

Thank you for looking into this.

Best, Tommy

CharlesJB commented 6 years ago

Hello Tommy,

Thank you for your interest in ENCODExplorer!

The version of the metadata is updated before each release of Bioconductor. Which mean the version of the metadata file will tend to be outdated by the end of each cycle.

I discussed with the people at Bioconductor about this and they recommend to keep a stable version for the complete release cycle to improve reproducibility.

This being said, it is possible to download all the tables from ENCODE and produce a new database that can be used by ENCODExplorer (see the Data Update vignette).

I will also prepare a new version that I will push on this github today if possible. This way you will be able to install the github version with the latest version of the metadata:

devtools::install_github("charlesjb/encodexplorer")
CharlesJB commented 6 years ago

I checked in the updated encode_df but I don't seem to find the biosample you are looking for. I'll investigate this further next week and will keep you up to date.

CharlesJB commented 6 years ago

OK, I pushed the new version on github. You can install with:

devtools::install_github("CharlesJB/ENCODExplorer")

I tested your initial query, which now returns 32 files.

I will also push the new version on the development branch soon.

crazyhottommy commented 6 years ago

Thank you very much! Just FYI, some fastqs are there but not necessary open to everyone. for this data https://www.encodeproject.org/search/?type=Experiment&assay_title=ChIP-seq&target.investigated_as=histone+modification&files.file_type=fastq&biosample_type=primary+cell&biosample_term_name=foreskin+melanocyte&biosample_term_name=foreskin+melanocyte

I can not download the fastqs, I got answer from the ENCODE DCC:

That data was generated by the Roadmap Epigenomics consortium and the raw reads are protected by dbGaP. The ENCODE DCC was granted access to the raw data through dbGaP to process and make only the pipeline results available to the community, not the raw data. We have displayed the restricted files as objects on the ENCODE portal with accurate metadata in the interest of data provenance in the case where a user does gain access to the raw data themselves.

You can inquire about access to the Costello lab-produced Roadmap data at this link: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?adddataset=phs000791&consent=HMB&page=login

Anyway, thanks for the update!

Tommy

CharlesJB commented 6 years ago

Thanks for the info, I was not aware of this. I'll try to see if there is something I can do.

crazyhottommy commented 6 years ago

great and thanks again!