databrickslabs / brickster

R Toolkit for Databricks
https://databrickslabs.github.io/brickster/
Apache License 2.0
42 stars 7 forks source link

upload requests failing #63

Closed brenktt closed 1 month ago

brenktt commented 2 months ago

Hi, I'm having issues with communicating with volume system using db_volume_write() function in the latest release (v0.2.4) using code that was working couple months ago.

I receive following error:

Error in httr2::req_perform():
! Failed to perform HTTP request. Caused by error in curl::curl_fetch_memory(): ! Could not resolve host: https; Unknown error

It seems to me that the issue is with the host parameter. So far I have provided it in format like https://adb-<many_digits>.<single_digit>.azuredatabricks.net/. According to the current documentation, host should be in a format like xxxxxxx.cloud.databricks.com.

Could you help me to acquire the host address in correct format?

zacdav-db commented 2 months ago

@brenktt presumably this is not specific to volumes, do other functions work?

I believe any of the following should be valid:

Remove the trailing / and try again please.

brenktt commented 2 months ago

Hi, I managed to get the connection working using the first option. adb-<many_digits>.<single_digit>.azuredatabricks.net

However, I have run into second issue when trying to upload files to volume. The file starts uploading, but the upload immediately freezes and the expected upload time keeps increasing in the console.

image

In this example I'm trying to upload very small parquet file (15kb).

I have tried reading from the volume and this works as expected, so the issue seems to be just with this db_volume_write() function.

zacdav-db commented 2 months ago

@brenktt can you try this please:

# adjust before running
vpath <- "/Volumes/<catalog>/<schema>/<volume>"

# save to tempdir
dir <- tempdir()
fpath <- file.path(dir, "cars.csv")
write.csv(cars, fpath)

# upload to volume
vol_dest <- file.path(vpath, "cars.csv")
brickster::db_volume_write(path = vol_dest, file = fpath, overwrite = TRUE)

# read from volume
local_dest <- file.path(dir, "vol_cars.csv")
path <- brickster::db_volume_read(path = vol_dest, destination = local_dest)

read.csv(path) # or `read.csv(local_dest)`

I'm currently unable to reproduce the issue thus far, even with larger data.

brenktt commented 2 months ago

I have tried your solution and it works and to my surprise my code now works as well. There must have been some network issue at the time or perhaps I messed up with some of the function inputs.

Thank you so much for your help and sorry I wasted your time.

One more question from my side- I think I asked some time ago, but is there any plan to have the package available on CRAN? It was a life saver for me and it would be great if I did not have to install through GitHub.

zacdav-db commented 2 months ago

No worries, glad its working now!

CRAN process has been kicked off, I did the first review a few weeks ago. I have put some time aside to go through the feedback and hopefully all things going well then its on CRAN soon 🤞.

brenktt commented 2 months ago

Hopefully it works out for the best!

Please could you let the issue open for a little longer so I can test on larger datasets as well?

brenktt commented 2 months ago

It seems I was too quick with conclusions as I have tested on a file that is smaller thank 16 KB. For some reason the upload of files freezes at exactly 16 KB for all files (whether it is parquet or csv).

zacdav-db commented 2 months ago

Is there an example file you can make thats reproducible?

brenktt commented 2 months ago

I just pasted couple of mtcars dataframes together so it exceeds 16 KB. write.csv( dplyr::bind_rows( mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars ), "mtcars.csv" )

zacdav-db commented 2 months ago

Hmm, that works fine for me, I also tested data that was 150MB which worked as well.

e.g. adjusting my example to write 100k rows

write.csv(dplyr::sample_n(cars, 100000, TRUE), fpath)
brenktt commented 2 months ago

This is where the difference is between our sessions. Do you have any idea what could be causing this or is there anything else I could provide that could be investigated?

zacdav-db commented 2 months ago

@brenktt you can paste an output of sessionInfo().

Ensure httr2 is up to date and maybe try a different internet connection?

brenktt commented 2 months ago

Here is the output:

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils    
[6] methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        rstudioapi_0.14   knitr_1.42       
 [4] magrittr_2.0.3    rappdirs_0.3.3    tidyselect_1.2.0 
 [7] bit_4.0.5         lattice_0.20-45   R6_2.5.1         
[10] rlang_1.1.4       fansi_1.0.3       stringr_1.5.0    
[13] httr2_1.0.2       tools_4.2.1       grid_4.2.1       
[16] xfun_0.39         png_0.1-8         arrow_14.0.0.2   
[19] utf8_1.2.2        DBI_1.1.3         cli_3.4.1        
[22] brickster_0.2.4   bit64_4.0.5       assertthat_0.2.1 
[25] tibble_3.1.8      lifecycle_1.0.3   Matrix_1.4-1     
[28] purrr_0.3.5       vctrs_0.5.2       glue_1.6.2       
[31] stringi_1.7.8     compiler_4.2.1    pillar_1.8.1     
[34] jsonlite_1.8.3    reticulate_1.38.0 renv_0.16.0      
[37] pkgconfig_2.0.3  

I have also spoken to our IT department and it seems only I have this issue. I will get back to you if this gets solved somewhere on the IT side. It is likely it is not actually an issue with the package.

zacdav-db commented 2 months ago

Keep me posted. I'll close the issue in a week or two if I don't hear otherwise. Can always re-open.

brenktt commented 2 months ago

I will probably have an answer sometime at the start of September, so please keep the issue open until then.

brenktt commented 2 months ago

@zacdav-db So it turns out the issue is with httr2. The upload does not work with versions of the package above 1.0.1 (I have checked both 1.0.2 & 1.0.3). When I downgrade the package version this issue disappears.

zacdav-db commented 2 months ago

Thanks @brenktt, I can now repro the issue.

I'm having a dig through what's changed in {httr2}. I wonder if the changes in https://github.com/r-lib/httr2/pull/489 are to do with it 🤔

zacdav-db commented 2 months ago

I've tested the repro with the commit before the change and then the commit with the change and its clear that it is the culprit.

remotes::install_github(repo = "r-lib/httr2", ref = "ff16551") # before change, works
remotes::install_github(repo = "r-lib/httr2", ref = "bdb13fe") # after change, fails
brenktt commented 2 months ago

From my part I'm happy this is now working, but of course it would be best to have the package working with newest versions as there was a lot of time spent to find the issue.

zacdav-db commented 2 months ago

@brenktt of course. I'm investigating and will likely raise an issue with httr2 if its indeed an issue there.

I want this to work with all versions without issue too!

zacdav-db commented 2 months ago

Raised an issue with httr2 (https://github.com/r-lib/httr2/issues/524)

zacdav-db commented 2 months ago

I'll be waiting for a resolution before continuing with CRAN process - this is important before release.

zacdav-db commented 2 months ago

@brenktt The issue is now fixed in the development version of {httr2} - thanks again for raising the issue and initial debugging.

https://github.com/r-lib/httr2/pull/525

brenktt commented 2 months ago

Thanks to you for prompt investigation!

zacdav-db commented 1 month ago

You can now install {httr2} 1.0.4 to resolve this issue.

https://github.com/r-lib/httr2/releases/tag/v1.0.4