cloudyr / googleCloudStorageR

Google Cloud Storage API to R
https://code.markedmondson.me/googleCloudStorageR
Other
104 stars 29 forks source link

Modify gcs_upload to also use an extension on the tempfile #91

Closed himanshu-sikaria closed 5 years ago

himanshu-sikaria commented 6 years ago

Usecase: Write raster files (.tif format) using the gcs_upload function. Using the function writeRaster from raster package.

Error: Problem writing file using object_function

Possible solution: Adding an extension to the tempfile. Looks very similar to the closed issue (#226) in aws.s3 package.

MarkEdmondson1234 commented 6 years ago

Is this not possible by providing your own write function? From the examples:

f <- function(input, output){
   write.csv2(input, file = output)
 }

gcs_upload(mtcars, name = "mtcars_csv2.csv", object_function = f)

...so for your use case:

f <- function(input, output){
   # change this to write tif files
  raster::writeRaster(input, file = output)  # a guess, don't know this function :)
 }

gcs_upload(your_raster, name = "raster.tif", object_function = f)
himanshu-sikaria commented 6 years ago

Did exactly the same thing:

f <- function(input, output){
  raster::writeRaster(input, filename = output, format="GTiff", overwrite=TRUE)
}
gcs_upload(my_raster, name = "raster.tif", object_function = f)

Which resulted in the above mentioned error: Problem writing file using object_function

Internally what gcs_upload is doing (from what I could understand) is using the f function to first write to a tempfile and then uploads to the GCS. While writing it to a tempfile, it ignores the file extension which results to the error.

MarkEdmondson1234 commented 6 years ago

Hmm ok, I guess an error raised with rasterthen - would you mind outputting the whole stack trace of the error?

I think a work around would be to save the raster to a .tif file yourself, then pass in the filename to gcs_upload

But I will also look at making a change, where the temporary file is created with the same file extension as what is passed to name, as you suggested.

MarkEdmondson1234 commented 6 years ago

@himanshu-sikaria could you download the latest GitHub version and see if it fixes the issue? You will need to specify your name argument to have .tif at the end.

remotes::install_github("cloudyr/googleCloudStorageR")

himanshu-sikaria commented 6 years ago

Thanks for adding the patch. We are very close to solving this.

## take file extension of name, fix #91
    temp <- tempfile(fileext = tools::file_ext(name))

In the above code (line 135) fileext by default doesn't consider the . after the filename. Could you please change it to

temp <- tempfile(fileext = paste0(".", tools::file_ext(name)))
MarkEdmondson1234 commented 6 years ago

Ah thanks, that change is live now.

himanshu-sikaria commented 6 years ago

@MarkEdmondson1234 After your latest modifications, I have started getting this error for the following code. I am running the latest dev version for this package.

library(raster)
test = raster("~/Downloads/final/green_tn.tif")
f <- function(input, output){
  writeRaster(input, filename = output,format="GTiff", overwrite=TRUE)
}
gcs_upload(test, name = "tests/test.tif", object_function = f)

I get this error:

Error in value[[3L]](cond) : 
  object_function error: unable to find an inherited method for function ‘writeRaster’ for signature ‘"gcs_cf", "character"’
In addition: Warning message:
In class(file) <- c("gcs_cf", class(file)) :
  Setting class(x) to multiple strings ("gcs_cf", "RasterLayer", ...); result will no longer be an S4 object
MarkEdmondson1234 commented 6 years ago

Hmm ok, I don't know how to deal with S4 objects, only S3.

This will work, as it uploads from disk:

library(raster)
gcs_upload("~/Downloads/final/green_tn.tif", name = "tests/test.tif")

In fact there is nothing to gain from passing in a raster object if uploading already from disk, is this your use case or do you need it to upload the objects directly?

himanshu-sikaria commented 6 years ago

Thanks for the quick response. This is just an example, I would ideally like to download multiple rasters from GCS, do some operations and upload the final raster back to GCS. What could be the best way to solve for this?

MarkEdmondson1234 commented 6 years ago

I think downloading and uploading to file, then using raster as you would normally on files. The custom function only saves the object to a temporary file anyway, so you would gain in speed and memory not having temporary files.

Something like (not tested):

# upload all .tif files
my_rasters <- list.files(pattern = ".tif$")
lapply(my_rasters, function(x){
  gcs_upload(x, name = x)
})

# download all .tif files
objs <- gcs_list_objects("my_bucket")
tif_objs <- objs$name[grepl(".tif$",objs$name)]

lapply(tif_objs, function(x) {
  gcs_get_object(x, saveToDisk = paste0("download/", x), overwrite = TRUE)
})

Or put it in one function something like:


# download all .tif files
objs <- gcs_list_objects("my_bucket")
tif_objs <- objs$name[grepl(".tif$",objs$name)]

## loop over the tif files
lapply(tif_objs, function(x) {
  # download to a download folder
  gcs_get_object(x, saveToDisk = paste0("download/", x), overwrite = TRUE)
  tif <- raster::raster(paste0("download/", x))
  # do stuff to tifs
  new_tif <- idontknow(tif)
  # write modified to disk
  raster::writeRaster(new_tif, filename = paste0("upload",x),format="GTiff", overwrite=TRUE)
  # upload to upload folder
  gcs_upload(paste0("upload",x))
})