OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.64k stars 2.46k forks source link

Faster rasterization algorithm? #7200

Open kadyb opened 1 year ago

kadyb commented 1 year ago

In R there is great fasterize package for fast polygon rasterization. It uses the scan line algorithm (https://dl.acm.org/doi/10.1145/1465611.1465619), which appears to be ~10 times faster than GDAL. Maybe you could consider implementing it?

R code ```r library("sf") library("raster") library("fasterize") ## download data options(timeout = 600) download.file("https://s3.amazonaws.com/hp3-shapefiles/Mammals_Terrestrial.zip", destfile = "Mammals_Terrestrial.zip") # 383 MB unzip("Mammals_Terrestrial.zip") ## fasterize system.time({ mammal_shapes = read_sf("Mammals_Terrestrial") mammal_raster = raster(mammal_shapes, res = 0.1) x = fasterize(mammal_shapes, mammal_raster, field = "PRESENCE") writeRaster(x, "test1.tif", options = "COMPRESS=LZW", datatype = "INT1U", NAflag = 0) }) #> 5.35 s ## GDAL rasterize (3.5.2) system.time({ gdal_utils(util = "rasterize", source = "Mammals_Terrestrial", destination = "test2.tif", options = c("-a", "PRESENCE", "-tr", 0.1, 0.1, "-a_nodata", 0, "-ot", "Byte", "-co", "COMPRESS=LZW")) }) #> 49.44 s ```
rouault commented 1 year ago

Code contribution welcome...

mdsumner commented 1 year ago

not much to offer here but fwiw, fasterize doesn't just do the rasterizing but also the materializing of pixels and it's not seperable in its current form, and it doesn't scale as the entire array (not just the sparse line start-end index) ends up in memory.

I've been distilling it out here, with a (very ...) long term aim of integration with what is already in GDAL (it's still Rcpp bound for example).

https://github.com/hypertidy/controlledburn

In the shorter term I'd aimed to at least illustrate what's common and what's different. (The other case here is {exactexractr} which has exactly the same opportunity for seperability and integration, fasterize is purely centre-point in/out, exactextractr does weighted coverage - by using GEOS at its core for intersection calc).

@kadyb the benchmarks are welcome and I'd like to see fleshed out more with very large raster output (with 64Gb of RAM about the largest fasterize can handle is on the order of 20K x 20K pix). controlledburn can go very much larger on smaller resources, because it offloads the actual pixel value materializing. I think this is not the place for such detailed discussion but good as a placeholder for folks interested more broadly, or working on similar consolidation.

kadyb commented 11 months ago

I did some more tests and this issue requires deeper analysis, because in some cases, it seems that the GDAL algorithm is faster. See this example below.

R code ```r library("sf") library("raster") library("fasterize") set.seed(1) n = 1000 df = data.frame(x = runif(n) * 10000, y = runif(n) * 10000) pts = st_as_sf(df, coords = c("x", "y")) buffers = st_buffer(pts, dist = 500) tmp_vector = tempfile(fileext = ".gpkg") tmp_raster = tempfile(fileext = ".tif") write_sf(buffers, tmp_vector) ## fasterize system.time({ buffers = read_sf(tmp_vector) dest = raster(buffers, res = 1) r = fasterize(buffers, dest) writeRaster(r, tmp_raster, options = "COMPRESS=LZW", datatype = "INT1U", NAflag = 0) }) #> user system elapsed #> 8.06 0.51 8.57 ## GDAL rasterize (3.6.2) system.time({ gdal_utils(util = "rasterize", source = tmp_vector, destination = tmp_raster, options = c("-tr", 1, 1, "-a_nodata", 0, "-ot", "Byte", "-co", "COMPRESS=LZW") ) }) #> user system elapsed #> 1.95 0.02 1.97 ```
rouault commented 1 month ago

Setting the -optim option of gdal_rasterize might help depending on the set of input geometries: https://gdal.org/programs/gdal_rasterize.html#cmdoption-gdal_rasterize-optim