Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
83 stars 23 forks source link

404 when using legacy=TRUE #49

Closed fpbarthel closed 6 years ago

fpbarthel commented 6 years ago

Dear developers,

I am getting a 404 when using legacy=TRUE to get a set of files().

For example:

file_list = files(legacy = FALSE) %>% results()

works fine, but

file_list = files(legacy = TRUE) %>% results()

returns

Error in .gdc_post(entity_name(x), body = body, legacy = x$legacy, token = NULL, : Not Found (HTTP 404).

Is this intended behavior? I have been having trouble in general accessing the legacy archive on the GDC website.

LiNk-NY commented 6 years ago

Hi @fpbarthel,

This is working for me. Please check your network settings and make sure you are not behind a firewall or using VPN.

See my session info below.

Regards, Marcel

sessionInfo()
R Under development (unstable) (2017-11-21 r73765)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/libf77blas.so.3.0
LAPACK: /usr/lib/openblas-base/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicDataCommons_1.3.4 magrittr_1.5             colorout_1.1-3          

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16           xml2_1.2.0             bindr_0.1.1           
 [4] XVector_0.19.9         hms_0.4.2              rappdirs_0.3.1        
 [7] GenomicRanges_1.31.23  BiocGenerics_0.25.3    zlibbioc_1.25.0       
[10] IRanges_2.13.28        R6_2.2.2               rlang_0.2.0           
[13] httr_1.3.1             GenomeInfoDb_1.15.5    dplyr_0.7.4           
[16] tools_3.5.0            parallel_3.5.0         lazyeval_0.2.1        
[19] assertthat_0.2.0       tibble_1.4.2           bindrcpp_0.2.2        
[22] GenomeInfoDbData_1.1.0 readr_1.1.1            S4Vectors_0.17.42     
[25] bitops_1.0-6           curl_3.2               RCurl_1.95-4.10       
[28] glue_1.2.0             compiler_3.5.0         BiocInstaller_1.29.6  
[31] pillar_1.2.1           stats4_3.5.0           jsonlite_1.5          
[34] pkgconfig_2.0.1       
fpbarthel commented 6 years ago

Interesting. There seems to be some connectivity issues. Thank you. is there any way to get legacy server status using GenomicDataCommons package?

Like how GenomicDataCommons::status() gets non-legacy status?

seandavi commented 6 years ago

Thanks for the report, @fpbarthel. The legacy and non-legacy data are hosted on the same system, so there is only one "status." If you are still finding problems, can you give us the output of sessionInfo()?

fpbarthel commented 6 years ago

The problem has been consistent since I started trying 1-2 weeks ago. I vaguely remember it working correctly once, but I am beginning to doubt this.

Here is my sessionInfo():

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.2.0            stringr_1.2.0            dplyr_0.7.4              purrr_0.2.4             
 [5] readr_1.1.1              tidyr_0.7.2              tibble_1.3.4             ggplot2_2.2.1           
 [9] tidyverse_1.2.1          listviewer_2.0.0         GenomicDataCommons_1.2.0 magrittr_1.5            

loaded via a namespace (and not attached):
 [1] reshape2_1.4.2          haven_1.1.0             lattice_0.20-35         colorspace_1.3-2       
 [5] htmltools_0.3.6         stats4_3.4.2            yaml_2.1.14             rlang_0.1.4            
 [9] foreign_0.8-69          glue_1.2.0              BiocGenerics_0.24.0     modelr_0.1.1           
[13] readxl_1.0.0            bindrcpp_0.2            GenomeInfoDbData_0.99.1 bindr_0.1              
[17] plyr_1.8.4              zlibbioc_1.24.0         munsell_0.4.3           gtable_0.2.0           
[21] cellranger_1.1.0        rvest_0.3.2             htmlwidgets_0.9         psych_1.7.8            
[25] IRanges_2.12.0          GenomeInfoDb_1.14.0     curl_3.0                parallel_3.4.2         
[29] broom_0.4.2             Rcpp_0.12.13            scales_0.5.0            S4Vectors_0.16.0       
[33] jsonlite_1.5            XVector_0.18.0          mnormt_1.5-5            hms_0.3                
[37] digest_0.6.12           stringi_1.1.5           GenomicRanges_1.30.0    grid_3.4.2             
[41] cli_1.0.0               tools_3.4.2             bitops_1.0-6            lazyeval_0.2.1         
[45] RCurl_1.95-4.8          crayon_1.3.4            pkgconfig_2.0.1         data.table_1.10.4-3    
[49] xml2_1.1.1              lubridate_1.7.1         rstudioapi_0.7          assertthat_0.2.0       
[53] httr_1.3.1              R6_2.2.2                nlme_3.1-131            compiler_3.4.2         

Here is GDC server status

> GenomicDataCommons::status()
$commit
[1] "c5203a2ebd78c5768b00f606ffda482dc7aaddf5"

$data_release
[1] "Data Release 10.1 - February 15, 2018"

$status
[1] "OK"

$tag
[1] "1.14.0"

$version
[1] 1
fpbarthel commented 6 years ago

Turns out GDC blocked connections to the legacy archive from my institution. This was resolved via contacting GDC support.

fpbarthel commented 6 years ago

Actually, reopening this. I am now getting this error again and reproduced the error from different network circumstances, including from different institutions. Is anyone else experiencing this?

LiNk-NY commented 6 years ago

I'm still not experiencing this issue but I will keep checking it occasionally.

seandavi commented 6 years ago

I'd suggest checking with the GDC folks if the problem persists. Note that the API that the package uses is the same one that operates the website, so you could also check the website as well.

fpbarthel commented 6 years ago

The website is all hit-or-miss, half of the time it will work, half of the time it will time-out or 404 or something along those lines. I've opened another ticket with GDC (last time they were able to resolve it) so hopefully that will amount to something.

I am surprised you are not having this issue @LiNk-NY, I tested under the following circumstances:

And in all of the above we are getting 404's (again only with legacy=TRUE but not with legacy=FALSE).

seandavi commented 6 years ago

What version of GenomicDataCommons are you using? If you have not upgraded to version 1.4.1 (or the devel version), could you try that?

LiNk-NY commented 6 years ago

I've tried it again recently and the status is still "OK" for me. Could it be an East vs West coast server type of issue?

seandavi commented 6 years ago

@fpbarthel, I just read your reply a bit closer. Are you saying that you experience similar problems when accessing the legacy web portal?

fpbarthel commented 6 years ago

@seandavi That resolved the problem! I wish the error messages were more helpful here. Was experiencing the exact same error message we were having a month ago so I figured it was the same cause (IP blocked according to GDC). Strangely we replicated the error in many different settings, including different R installations and environments, which would suggest it's not an IP block. But perhaps somehow all running an outdated GenomicDataCommons? One of the test environments was a fresh install, but maybe bioconductor was outdated here, and it did not install the recent version. Were there changes made between GDC 1.2.0 --> GDC 1.5.3 that could explain this? Both 1.2.0 and 1.4.1 did not work.

Anyway, thank you both for the help.

seandavi commented 6 years ago

Good to know that an upgrade fixed things. What happened about a month ago is that the GDC folks changed the API URL. They attempted to apply redirects to the old URLs to keep the old URLs working, but it appears that some subset (see a couple of other issues with similar problems) of the URLs were not redirected properly. I applied necessary changes to v1.5.3 and 1.4.1. If you see issues with these versions, let me know.

Sorry for the inconvenience and for not being clearer in my replies. It took quite some troubleshooting to find the underlying URL problem.