SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
173 stars 19 forks source link

The downloading of reference data is failed. #123

Closed Midsummer723 closed 4 years ago

Midsummer723 commented 4 years ago

Hi I am using the HumanPrimaryCellAtlasData() and MonacoImmuneData() , but I have the problem of downloading them.

Fehler in .util_download(x, rid[i], proxy, config, "bfcadd()", ...) : bfcadd() failed; see warnings() Zusätzlich: Warnmeldungen: 1: download failed web resource path: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’ local file path: ‘C:\Users\LokalAdm\AppData\Local\ExperimentHub\ExperimentHub\Cache/11ac473017a_experimenthub.sqlite3’ reason: Failed to connect to experimenthub.bioconductor.org port 443: Connection refused 2: bfcadd() failed; resource removed rid: BFC17 fpath: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’ reason: download failed

I read the #72. and #109. I checked the

curl::has_internet() [1] TRUE pingr::is_online() [1] TRUE curl::nslookup("experimenthub.bioconductor.org") [1] "52.73.93.102" I tired the solutions on #72. But it did not work.

R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] scRNAseq_2.2.0 scater_1.16.0 SingleR_1.2.2 cowplot_1.0.0
[5] ggplot2_3.3.0 patchwork_1.0.0 Seurat_3.1.5 dplyr_0.8.5
[9] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1 DelayedArray_0.14.0 matrixStats_0.56.0
[13] Biobase_2.48.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.0 IRanges_2.22.1
[17] S4Vectors_0.26.0 BiocGenerics_0.34.0

loaded via a namespace (and not attached): [1] ggbeeswarm_0.6.0 Rtsne_0.15 colorspace_1.4-1
[4] ellipsis_0.3.0 ggridges_0.5.2 XVector_0.28.0
[7] BiocNeighbors_1.6.0 rstudioapi_0.11 leiden_0.3.3
[10] listenv_0.8.0 npsurv_0.4-0.1 ggrepel_0.8.2
[13] bit64_0.9-7 AnnotationDbi_1.50.0 interactiveDisplayBase_1.26.0 [16] codetools_0.2-16 splines_4.0.0 lsei_1.2-0.1
[19] jsonlite_1.6.1 ica_1.0-2 cluster_2.1.0
[22] dbplyr_1.4.3 png_0.1-7 uwot_0.1.8
[25] shiny_1.4.0.2 sctransform_0.2.1 BiocManager_1.30.10
[28] compiler_4.0.0 httr_1.4.1 fastmap_1.0.1
[31] assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2
[34] later_1.0.0 BiocSingular_1.4.0 htmltools_0.4.0
[37] tools_4.0.0 rsvd_1.0.3 igraph_1.2.5
[40] gtable_0.3.0 glue_1.4.0 GenomeInfoDbData_1.2.3
[43] RANN_2.6.1 reshape2_1.4.4 rappdirs_0.3.1
[46] Rcpp_1.0.4.6 vctrs_0.3.0 ape_5.3
[49] nlme_3.1-147 ExperimentHub_1.14.0 DelayedMatrixStats_1.10.0
[52] lmtest_0.9-37 stringr_1.4.0 ps_1.3.3
[55] globals_0.12.5 mime_0.9 lifecycle_0.2.0
[58] irlba_2.3.3 future_1.17.0 AnnotationHub_2.20.0
[61] zlibbioc_1.34.0 MASS_7.3-51.6 zoo_1.8-8
[64] scales_1.1.1 promises_1.1.0 RColorBrewer_1.1-2
[67] yaml_2.2.1 curl_4.3 memoise_1.1.0
[70] reticulate_1.15 pbapply_1.4-2 gridExtra_2.3
[73] stringi_1.4.6 RSQLite_2.2.0 BiocVersion_3.11.1
[76] pingr_2.0.0 BiocParallel_1.22.0 rlang_0.4.6
[79] pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-41
[82] ROCR_1.0-11 purrr_0.3.4 htmlwidgets_1.5.1
[85] processx_3.4.2 bit_1.1-15.2 tidyselect_1.1.0
[88] RcppAnnoy_0.0.16 plyr_1.8.6 magrittr_1.5
[91] R6_2.4.1 DBI_1.1.0 pillar_1.4.4
[94] withr_2.2.0 fitdistrplus_1.0-14 survival_3.1-12
[97] RCurl_1.98-1.2 tibble_3.0.1 future.apply_1.5.0
[100] tsne_0.1-3 crayon_1.3.4 KernSmooth_2.23-17
[103] BiocFileCache_1.12.0 plotly_4.9.2.1 viridis_0.5.1
[106] grid_4.0.0 data.table_1.12.8 blob_1.2.1
[109] digest_0.6.25 xtable_1.8-4 httpuv_1.5.2
[112] tidyr_1.0.3 munsell_0.5.0 beeswarm_0.2.3
[115] viridisLite_0.3.0 vipor_0.4.5

Thank you!

LTLA commented 4 years ago

I don't have much insight to offer here. My best guess is that the ExperimentHub server was either busy or down at the time. For example, I can retrieve on-line resources now without problems:

out <- MonacoImmuneData()
## snapshotDate(): 2020-04-27
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 1 resources
## retrieving 1 resource
##   |======================================================================| 100%
## 
## loading from cache
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 1 resources
## retrieving 1 resource
##   |======================================================================| 100%
## 
## loading from cache

out
## class: SummarizedExperiment
## dim: 46077 114
## metadata(0):
## assays(1): logcounts
## rownames(46077): A1BG A1BG-AS1 ... ZYX ZZEF1
## rowData names(0):
## colnames(114): DZQV_CD8_naive DZQV_CD8_CM ... G4YW_Neutrophils
##   G4YW_Basophils
## colData names(3): label.main label.fine label.ont

The other possibility is that you are surrounded by a firewall, for which I can help even less.

Midsummer723 commented 4 years ago

Unfortunately, I still have the problem.

ref.se <- HumanPrimaryCellAtlasData() Fehler in .util_download(x, rid[i], proxy, config, "bfcadd()", ...) : bfcadd() failed; see warnings() Zusätzlich: Warnmeldungen: 1: download failed web resource path: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’ local file path: ‘C:\Users\LokalAdm\AppData\Local\ExperimentHub\ExperimentHub\Cache/11ac6c195f4_experimenthub.sqlite3’ reason: Failed to connect to experimenthub.bioconductor.org port 443: Connection refused 2: bfcadd() failed; resource removed rid: BFC19 fpath: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’ reason: download failed

It is strange. Because I used this internet before which could work. Even it is a campus network. This time I just switch to a new computer. Now it cannot work.

Midsummer723 commented 4 years ago

If it is the firewall. Can I download it from another computer with another internet and then export the files to this one? If I can do that. Could you tell me how to do it? Thank you very much.

j-andrews7 commented 4 years ago

I agree with Aaron that this sounds like a firewall issue. You could try temporarily disabling your firewall and/or adding experimenthub as an exception to it.

And while yes, you could download on another computer and save it as an RDS or RDA object and load it on your current computer, it's likely a better use of your time to resolve the firewall issues, as you will almost certainly run into them with other packages/applications as well.

Midsummer723 commented 4 years ago

The problem is that I can use R to download packages and other things. The firewall just stop the experimenthub? If it is ture, how can I fix it.

j-andrews7 commented 4 years ago

As I said above, I would try disabling your firewall temporarily to ensure this is the issue. You can google how to do so. I do not know your firewall settings, so if that is indeed the issue, you will be on your own as to how to allow the proper connections through. Your IT department may be able to help with that.

LTLA commented 4 years ago

If you read the error message, you will see that your problem is that the following file:

https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3

... is not downloading. A minimal reproducible example would be to attempt:

download.file("https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3",
    "copy.sqlite3", mode="wb")

If that fails, then it is a problem with the connection rather than with any SingleR functions.

Why that might be is something you'll have to find out. ExperimentHub merely points to an AWS S3 bucket; does your campus block access to S3? Too many students downloading movies, perhaps.

Midsummer723 commented 4 years ago

I can download this

download.file("https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3" "copy.sqlite3", mode="wb") versuche URL 'https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3' Content type 'application/octet-stream' length 3488768 bytes (3.3 MB) downloaded 3.3 MB

But the

ref.se <- HumanPrimaryCellAtlasData() Fehler in .util_download(x, rid[i], proxy, config, "bfcadd()", ...) : bfcadd() failed; see warnings() Zusätzlich: Warnmeldungen: 1: download failed web resource path: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’ local file path: ‘C:\Users\LokalAdm\AppData\Local\ExperimentHub\ExperimentHub\Cache/11ac7d1ffc8_experimenthub.sqlite3’ reason: Failed to connect to experimenthub.bioconductor.org port 443: Connection refused 2: bfcadd() failed; resource removed rid: BFC20 fpath: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’ reason: download failed

LTLA commented 4 years ago

Excellent, now we're getting somewhere. This looks like a BiocFileCache problem, @lshep.

lshep commented 4 years ago

Are you behind a proxy that needs to be set? experimenthub.bioconductor.org is up and running and should be accessible

Midsummer723 commented 4 years ago

I don`t think so. Because except download from the experimenthub, other things are fine. I can open the web. When I kick the "/metadata/experimenthub.sqlite3", there is a error "There is no app associated with the file to perform this action. Install an app or set up an assignment on the settings page for standard apps if an app is already installed." I am not sure what is wrong. Thank you for helping.

LTLA commented 4 years ago

I would hazard a guess and say that httr::GET in BiocFileCache is not acquiring the resource correctly on Windows. If I look at download.file, there seems to be all sorts of Windows-related special cases, and my suspicion is that httr::GET just doesn't cover them.

j-andrews7 commented 4 years ago

FWIW, I have no trouble downloading this on Windows:

> hpca <- HumanPrimaryCellAtlasData()
snapshotDate(): 2020-04-27
see ?SingleR and browseVignettes('SingleR') for documentation
loading from cache
see ?SingleR and browseVignettes('SingleR') for documentation
loading from cache
> sessionInfo()
R version 4.0.0 RC (2020-04-18 r78254)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SingleR_1.2.2               SummarizedExperiment_1.18.1 DelayedArray_0.14.0         matrixStats_0.56.0         
 [5] Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.0         IRanges_2.22.1             
 [9] S4Vectors_0.26.0            BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6                  rsvd_1.0.3                    lattice_0.20-41              
 [4] assertthat_0.2.1              digest_0.6.25                 mime_0.9                     
 [7] BiocFileCache_1.12.0          R6_2.4.1                      RSQLite_2.2.0                
[10] httr_1.4.1                    pillar_1.4.4                  zlibbioc_1.34.0              
[13] rlang_0.4.6                   curl_4.3                      rstudioapi_0.11              
[16] irlba_2.3.3                   blob_1.2.1                    Matrix_1.2-18                
[19] BiocNeighbors_1.6.0           BiocParallel_1.22.0           AnnotationHub_2.20.0         
[22] RCurl_1.98-1.2                bit_1.1-15.2                  shiny_1.4.0.2                
[25] compiler_4.0.0                httpuv_1.5.2                  BiocSingular_1.4.0           
[28] pkgconfig_2.0.3               htmltools_0.4.0               tidyselect_1.1.0             
[31] tibble_3.0.1                  GenomeInfoDbData_1.2.3        interactiveDisplayBase_1.26.0
[34] crayon_1.3.4                  dplyr_0.8.5                   dbplyr_1.4.3                 
[37] later_1.0.0                   bitops_1.0-6                  rappdirs_0.3.1               
[40] grid_4.0.0                    xtable_1.8-4                  lifecycle_0.2.0              
[43] DBI_1.1.0                     magrittr_1.5                  XVector_0.28.0               
[46] promises_1.1.0                DelayedMatrixStats_1.10.0     ellipsis_0.3.1               
[49] vctrs_0.3.0                   tools_4.0.0                   bit64_0.9-7                  
[52] glue_1.4.1                    purrr_0.3.4                   BiocVersion_3.11.1           
[55] fastmap_1.0.1                 yaml_2.2.1                    AnnotationDbi_1.50.0         
[58] BiocManager_1.30.10           ExperimentHub_1.14.0          memoise_1.1.0   
LTLA commented 4 years ago

@lshep Perhaps BiocFileCache's .httr_download should try to fall back to download.file if httr::GET doesn't work. It doesn't cost any dependencies and covers this particular scenario. My guess is that WinINet comes with correct auto-configured proxies but cURL does not.

@Midsummer723 if you find the URL of the proxy, you can try setting it with:

ExperimentHub::setExperimentOption("PROXY", "<put proxy here>")

You could also try testing whether the situation is any better on a non-Windows computer. It must be said that this is a common and effective solution to many problems (SSL, MKL, Conda, etc.).

Otherwise, I'm afraid I have no further suggestions to offer.

lshep commented 4 years ago

We don't think a fall back is necessary and feel like there is an underlying issue that may be the cause rather than the download method directly. Another possible suggestion is that https is not supported? You could change the hub to use the http address instead to see if that is the cause and as suggested also if you behind a proxy setting that. EXPERIMENT_HUB_URL environment variable can control the url and EXPERIMENT_HUB_PROXY can control the proxy. In the meantime I realized that the use of these types of environment variables are not well advertised or documented so I'll work on updating the vignette and troubleshooting doc in the hubs.

Midsummer723 commented 4 years ago

The problem has been solved somehow. Thanks for all the kind helping!

LTLA commented 4 years ago

@Midsummer723 I must say that this is not an entirely satisfactory conclusion. What did you do to make it "work"? Did you set the proxy? Did you change network? Did you use a different machine?

wangmybaba commented 4 years ago

i encountered a problem which have been bothering me for several days. i can download the experimenthub.sqlite3 to my computer(win10 professional) : local file path: ‘C:\Users\admin\AppData\Local\ExperimentHub\ExperimentHub\Cache/5b8aae3bfa_experimenthub.sqlite3’.But it is too slow to retrieving this resource. Half an hour has passed ,but only 3 percent of this resource have been retrieved. I wonder what the retrieve speed is related to,the internet? computer memory? CRAN mirror(well-known,one China mirror should be reset for downloading R packages if i don't want to go crazy ) or others? Is there any good way to deal with this situation for someone like me from China, only VPN?

dtm2451 commented 4 years ago

I am going to close this as the initial issue is resolved.

@wangmybaba I'm not sure of an answer for you. I hope that you were ultimately able to obtain the reference data? If not, please create a separate issue as the difficulties your are having are quite distinct from Midsummer's.