Closed brenktt closed 1 month ago
@brenktt presumably this is not specific to volumes, do other functions work?
I believe any of the following should be valid:
adb-<many_digits>.<single_digit>.azuredatabricks.net
https://adb-<many_digits>.<single_digit>.azuredatabricks.net
Remove the trailing /
and try again please.
Hi, I managed to get the connection working using the first option.
adb-<many_digits>.<single_digit>.azuredatabricks.net
However, I have run into second issue when trying to upload files to volume. The file starts uploading, but the upload immediately freezes and the expected upload time keeps increasing in the console.
In this example I'm trying to upload very small parquet file (15kb).
I have tried reading from the volume and this works as expected, so the issue seems to be just with this db_volume_write()
function.
@brenktt can you try this please:
# adjust before running
vpath <- "/Volumes/<catalog>/<schema>/<volume>"
# save to tempdir
dir <- tempdir()
fpath <- file.path(dir, "cars.csv")
write.csv(cars, fpath)
# upload to volume
vol_dest <- file.path(vpath, "cars.csv")
brickster::db_volume_write(path = vol_dest, file = fpath, overwrite = TRUE)
# read from volume
local_dest <- file.path(dir, "vol_cars.csv")
path <- brickster::db_volume_read(path = vol_dest, destination = local_dest)
read.csv(path) # or `read.csv(local_dest)`
I'm currently unable to reproduce the issue thus far, even with larger data.
I have tried your solution and it works and to my surprise my code now works as well. There must have been some network issue at the time or perhaps I messed up with some of the function inputs.
Thank you so much for your help and sorry I wasted your time.
One more question from my side- I think I asked some time ago, but is there any plan to have the package available on CRAN? It was a life saver for me and it would be great if I did not have to install through GitHub.
No worries, glad its working now!
CRAN process has been kicked off, I did the first review a few weeks ago. I have put some time aside to go through the feedback and hopefully all things going well then its on CRAN soon 🤞.
Hopefully it works out for the best!
Please could you let the issue open for a little longer so I can test on larger datasets as well?
It seems I was too quick with conclusions as I have tested on a file that is smaller thank 16 KB. For some reason the upload of files freezes at exactly 16 KB for all files (whether it is parquet or csv).
Is there an example file you can make thats reproducible?
I just pasted couple of mtcars dataframes together so it exceeds 16 KB.
write.csv( dplyr::bind_rows( mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars, mtcars ), "mtcars.csv" )
Hmm, that works fine for me, I also tested data that was 150MB which worked as well.
e.g. adjusting my example to write 100k rows
write.csv(dplyr::sample_n(cars, 100000, TRUE), fpath)
This is where the difference is between our sessions. Do you have any idea what could be causing this or is there anything else I could provide that could be investigated?
@brenktt you can paste an output of sessionInfo()
.
Ensure httr2
is up to date and maybe try a different internet connection?
Here is the output:
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils
[6] methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 rstudioapi_0.14 knitr_1.42
[4] magrittr_2.0.3 rappdirs_0.3.3 tidyselect_1.2.0
[7] bit_4.0.5 lattice_0.20-45 R6_2.5.1
[10] rlang_1.1.4 fansi_1.0.3 stringr_1.5.0
[13] httr2_1.0.2 tools_4.2.1 grid_4.2.1
[16] xfun_0.39 png_0.1-8 arrow_14.0.0.2
[19] utf8_1.2.2 DBI_1.1.3 cli_3.4.1
[22] brickster_0.2.4 bit64_4.0.5 assertthat_0.2.1
[25] tibble_3.1.8 lifecycle_1.0.3 Matrix_1.4-1
[28] purrr_0.3.5 vctrs_0.5.2 glue_1.6.2
[31] stringi_1.7.8 compiler_4.2.1 pillar_1.8.1
[34] jsonlite_1.8.3 reticulate_1.38.0 renv_0.16.0
[37] pkgconfig_2.0.3
I have also spoken to our IT department and it seems only I have this issue. I will get back to you if this gets solved somewhere on the IT side. It is likely it is not actually an issue with the package.
Keep me posted. I'll close the issue in a week or two if I don't hear otherwise. Can always re-open.
I will probably have an answer sometime at the start of September, so please keep the issue open until then.
@zacdav-db So it turns out the issue is with httr2
. The upload does not work with versions of the package above 1.0.1 (I have checked both 1.0.2 & 1.0.3). When I downgrade the package version this issue disappears.
Thanks @brenktt, I can now repro the issue.
I'm having a dig through what's changed in {httr2}
.
I wonder if the changes in https://github.com/r-lib/httr2/pull/489 are to do with it 🤔
I've tested the repro with the commit before the change and then the commit with the change and its clear that it is the culprit.
remotes::install_github(repo = "r-lib/httr2", ref = "ff16551") # before change, works
remotes::install_github(repo = "r-lib/httr2", ref = "bdb13fe") # after change, fails
From my part I'm happy this is now working, but of course it would be best to have the package working with newest versions as there was a lot of time spent to find the issue.
@brenktt of course. I'm investigating and will likely raise an issue with httr2
if its indeed an issue there.
I want this to work with all versions without issue too!
Raised an issue with httr2
(https://github.com/r-lib/httr2/issues/524)
I'll be waiting for a resolution before continuing with CRAN process - this is important before release.
@brenktt The issue is now fixed in the development version of {httr2}
- thanks again for raising the issue and initial debugging.
Thanks to you for prompt investigation!
You can now install {httr2}
1.0.4
to resolve this issue.
Hi, I'm having issues with communicating with volume system using
db_volume_write()
function in the latest release (v0.2.4) using code that was working couple months ago.I receive following error:
Error in
httr2::req_perform()
:! Failed to perform HTTP request. Caused by error in
curl::curl_fetch_memory()
: ! Could not resolve host: https; Unknown errorIt seems to me that the issue is with the
host
parameter. So far I have provided it in format likehttps://adb-<many_digits>.<single_digit>.azuredatabricks.net/
. According to the current documentation, host should be in a format likexxxxxxx.cloud.databricks.com
.Could you help me to acquire the host address in correct format?