eblondel / ows4R

R Interface for OGC Web-Services (OWS)
https://eblondel.github.io/ows4R/
Other
36 stars 8 forks source link

Fix WCS download failure for large rasters #111

Closed dfriend21 closed 10 months ago

dfriend21 commented 1 year ago

I've been using ows4R to get data from a WCS, and it's worked well most of the time - thanks for your work on this package. But I've found it fails for large rasters. Here's an example (note that the error doesn't occur until after the download, so on my computer the error doesn't occur until 10-15+ minutes after you started running it):

library(ows4R)
library(terra)

url <- "https://www.mrlc.gov/geoserver/rcmap_anhb/wcs?"
layer_id <- "rcmap_anhb__rcmap_annual_herbaceous_2009"

bounds <- c(-2599358, -1079803, 1139883, 2945300)

wcs <- WCSClient$new(url, "2.0.1")
r <- wcs$getCoverage(
  layer_id,
  OWSUtils$toBBOX(bounds[1], bounds[2], bounds[3], bounds[4])
)
plot(r)

This produces the following error:

Error in matrix(as.integer(v), ncol = nlyr(x)) : data is too long

I've done some digging through the code to try and find out what's going on. The error is happening on this line in terra, which appears to be triggered from WCSCoverageSummary.R, on line 625: https://github.com/eblondel/ows4R/blob/504c529ebc221e2f9757be93a2e003184ef2f5d7/R/WCSCoverageSummary.R#L625 It appears to be happening because terra::values() tries to load all the values of the raster into memory - for really large rasters, like the one in the example, this exceeds the maximum matrix size (which is apparently 2,147,483,647, at least according to this StackOverflow post - this checks out with the size of the raster I was trying to download, which has more than 3 billion cells).

The chunk of code that's causing the problem is getting the min and max of the raster and adding them as attributes - as far as I can tell, these attributes are never used. I simply removed these lines, and then the code ran fine. But if the attributes were important, they should be able to be safely calculated using terra::minmax(), which doesn't load the values into memory:

  cov_range <- minmax(coverage_data, compute = TRUE)
  if(!all(is.na(cov_range))){
    attr(coverage_data, "min") <- cov_range[1,1]
    attr(coverage_data, "max") <- cov_range[2,1]
  }

(Note that I haven't actually tested this code - also, it assumes that coverage_data is always a single-layer raster, which I'm not sure is a valid assumption).

So this pull request consists only of removing those few lines.

eblondel commented 1 year ago

Hi @dfriend21 i'm back from annual leaves, i will have a look asap to your contrib.

eblondel commented 1 year ago

@dfriend21 can you test/commit the proposal with minmax? It's better, at least to make it backward compatible at this stage.