cloudyr / googleCloudStorageR

Google Cloud Storage API to R
https://code.markedmondson.me/googleCloudStorageR
Other
104 stars 29 forks source link

Download everything on a folder #144

Closed andreavargasmon closed 3 years ago

andreavargasmon commented 3 years ago

As I understand, with gcs_get_object() you can download an object that is not a traditional R object like a .mp4 just by saying:

raw_download <- gcs_get_object(objects$name[[2]], 
                               saveToDisk = "test.mp4")

But what happens if I want to download a whole folder that has multiple types of objects, including .mp4, .png, etc.?

Is this possible?

If so, how? If not, can you consider adding this feature?

Thanks in advance, your package is amazing

MarkEdmondson1234 commented 3 years ago

Thanks! The folders aren't real folders even though they look it in the web interface, just names with / in them. I usually do something like this that uses the prefix argument to restrict listings to just files in a "folder".

my_folder <- "your_folder/"
objs <- gcs_list_objects(prefix = my_folder)

# to download to same named folder
dir.create(my_folder)

# download all the objects to that folder
dls <- lapply(objs$name, function(x) gcs_get_object(x, saveToDisk = x))
andreavargasmon commented 3 years ago

Thank you so much for the quick response! Thats a very clever way of handling the request!

However, when I try this, the following error message appears a couple of times:

Error in curl::curl_fetch_disk(url, x$path, handle = handle) : Failed to open file /Users/andreavargas/path_to_my_Rproject/my_folder

and then:

Error: Request failed before finding status code: Failed to open file /Users/andreavargas/andreavargas/path_to_my_Rproject/my_folder

Do you know why this happens?

MarkEdmondson1234 commented 3 years ago

Hmm is the folder created at the location?

andreavargasmon commented 3 years ago

my_folder is both in the bucket and in my Rproj (because I created with dir.create(my_folder))

andreavargasmon commented 3 years ago

Ah! I figure it out, the problem is that with objs <- gcs_list_objects(prefix = my_folder), the first row is the name of the folder itself. If you filter objs to drop this row:

objs <- objs %>% 
    dplyr::filter(name != my_folder)

and then run

dls <- lapply(objs$name, function(x) 
    gcs_get_object(x, saveToDisk = x))

everything works fine. Also, for map users, this:

dls <- map(objs$name, ~gcs_get_object(., saveToDisk = .))

Works justs as fine

MarkEdmondson1234 commented 3 years ago

Ah weird, my test example didn't have the empty folder name, perhaps a relic of what was uploaded.

Yes a tidy way could be a chain like this using walk() instead of map(), if you don't care about the TRUE returns the download functions return:

library(tidyverse)

my_folder <- "my_folder/"
dir.create(my_folder)

my_folder %>% 
  gcs_list_objects(prefix = .) %>% 
  dplyr::filter(name != my_folder) %>%
  pluck("name") %>% 
  walk(~gcs_get_object(., saveToDisk = ., overwrite = TRUE))