USDAForestService / gdalraster

R Bindings to GDAL (Geospatial Data Abstraction Library)
https://usdaforestservice.github.io/gdalraster/
Other
41 stars 7 forks source link

creating XYZ tiles #315

Closed wiesehahn closed 2 months ago

wiesehahn commented 5 months ago

I didnt find it in the docs, however I am sure I didnt get the entire functionality of gdalraster, so forgive me if I open an issue for this. Unfortunately for most webmap libraries (e.g. Maplibre GLJS) you need tiled raster data (XYZ / TMS), is there a way to run https://gdal.org/programs/gdal2tiles.html or similar here?

ctoney commented 5 months ago

Unfortunately there is not a way to run gdal2tiles.py using gdalraster. The GDAL utilities that are exposed in the GDAL C API are generally available (most of "core" executables, with one or two exceptions). Some that are Python programs have been ported, and several that provide an interface to certain API calls have the same functionality available in gdalraster via the API-level bindings.

Looking at gdal2tiles.py source, it was a Google Summer of Code project with 4600 lines, so probably not something we would port to R for gdalraster. Not sure, but maybe it could be accessed in R using qgisprocess?

Appreciate you opening an issue and asking about it though. Definitely interested in feedback on anything we might add that would be useful.

mdsumner commented 5 months ago

it wouldn't be too hard to implement gdal2tiles (or gdal_retile) in R, it's vaguely on my todo to do this (but it wouldn't be easy to cover all options).

I started exploring maybe there's a reticulate cheat-way around this, and remembered "tiler", so maybe that's a good way forward: https://cran.r-project.org/web/packages/tiler/index.html

I wrote (some of ) the tile logic in my dev package 'grout', so if anyone goes down this road maybe start there (and maybe I will because procrastination is fun): https://github.com/hypertidy/grout

ctoney commented 5 months ago

Thanks, I didn't mean to imply that porting gdal2tiles.py is not feasible in general, but kind of sounded like that. What I should have said is, I don't think it's something I could take on myself in the foreseeable future. I haven't used that utility and not familiar with details of the feature set, but the code appears to result from a substantial effort and ends up well over 4000 lines. I'm just assuming that porting would be a fairly major project relative to my current capacity. With better knowledge of the feature set, it sounds like reducing to core functionality is a possible option to make it more manageable.

@wiesehahn, I'm trying to prioritize the vector API described here and discussed in #241. That will take much of my time over the next few months, I expect. As far as opening an issue to inquire about functionality, welcome that and glad to consider feature requests too. Just this particular case looks difficult to accommodate for now.

If others were able to work on it for R, and gdalraster could be helpful in the effort, I would be glad to support if there's anything I can do within constraints above.

@mdsumner, thanks for adding information and prompting a better reply, hopefully.

wiesehahn commented 5 months ago

Thanks for your quick and detailed replies. Wow, 4600 lines of code is quite something for a thing which seems relatively trivial at first glance. For now its just testing things and I might stick with my dirty solution to do this step directly from QGIS, although my inner me wants to stick entirely to R / gdal.

I really appreciate what you are doing with gdalraster and also @mdsumner with hypertidy. I guess I didnt even scratch the surface of whats possible with it. Maybe one day I will understand. :)

mdsumner commented 5 months ago

oh all good, I actually just appreciate the space to think and share - this makes me look deeper in GDAL itself, like do we really need a Python (or an R) wrapper to do something like this? I also wonder if creating tiles is really necessary anymore but I'm not across the downstream web mapping (like in a perfect world GDAL wasm could deliver tile COG with overviews from any raster source - which is structurally identical to a tile hierarchy - to a webmap).

just to stop me speculating and dreaming, @wiesehahn if you have an example that would be relevant to your work I'd be keen to try a few things. (I think the final .html is harder than generating the tiles, but clearly there's not much boilerplate to modify given the output files that gdal2tiles generates).

wiesehahn commented 5 months ago

yeah I was wondering too why some parts in gdal are wrapped in python, I guess there is some obvious reason I just dont get?!. Seems not very straight forward to me as well to create physical tiles instead of using COGs internal tiles, but as far as I know there is not a direct way around this for all web mapping libraries (e.g. Open Layers does support it but Maplibre not)

What I am trying to do here is to process lidar data plus additional forestry data to derive some (raster) metrics relevant for forestry (topographic indices, height models, wood volume etc.) and vizualize them on a web map (Maplibre GL JS). As most parts of my workflow are running in R I thought it would be nice to create the tiles (possibly via vrt with no special settings) directly from R as well. Not sure if this is the info you were expecting.

ctoney commented 4 months ago

I think I'm following but apology in advance if I just add noise here. It sounds like you do need just a subset of gdal2tiles.py, only to generate the tiles. But you don't need the other stuff like generating "Simple web pages with viewers based on Google Maps, OpenLayers and Leaflet"?

And for one of your raster datasets, you would possibly generate many individual VRT files, with each VRT referencing an internal tile of the raster, i.e., it defines a subwindow of the raster to read that tile?

mdsumner commented 4 months ago

yeah I was wondering too why some parts in gdal are wrapped in python, I guess there is some obvious reason I just dont get?!.

GDAL is distributed with its own internal Python package and the API is constantly updated to match the C++ one, so there's a very good mapping between the underlying lib and the facilities in the osgeo.gdal, osgeo.ogr, osgeo.os (and others) Python modules. There are high level tasks that are just easier to do in Python and so some projects did exactly that and the result was good enough to include as a utility. Note that other utilites like gdalinfo,ogr2ogr were originally external to the library as well (written in C, loading the library but not existing as functions of the library), but now have API interfaces so you don't have to leave C++ to run them, but for whatever reason there's a lot of effort to convert the other python ones and no one has done so. There's lots of unfinished potential in GDAL everywhere, it just comes down to who can do it.

If R was built-in and shipped with GDAL with tight API bindings I bet there'd be R scripts that were really useful but not part of the C++ as well. 👍

I'm still keen to try this ou. Do you want to render an existing raster data via a colour map to PNG tiles? Or do you want a local copy of a large colour image from somewhere? An example would help me focus on a starting point, I was thinking to render out esri tiles for a country or something like that but if you have a more specific case that would help. I would use gdalraster to do it so it's not entirely OT and it should be easy to check by comparison with gdal2tiles or gdal_retile.

mdsumner commented 4 months ago

this is actually going really well, it's already working in overall form and I'm thinking about tweaks, and having fun with it. it's just a script atm but I'll clean it up and write some functions soon

wiesehahn commented 4 months ago

I think I'm following but apology in advance if I just add noise here. It sounds like you do need just a subset of gdal2tiles.py, only to generate the tiles. But you don't need the other stuff like generating "Simple web pages with viewers based on Google Maps, OpenLayers and Leaflet"?

Exactly, I dont think html stuff has to be handled by gdal. It might be nice for a quick vizualisation for some people but at least in R there are dedicated options for this.

And for one of your raster datasets, you would possibly generate many individual VRT files, with each VRT referencing an internal tile of the raster, i.e., it defines a subwindow of the raster to read that tile?

Not sure I got you right but I guess its more the opposite way. Intermediate outcome in my case would be many raster files/tiles, as lidar data, orthoimagery etc. are usually available in tiles and processes are optimized to process these in parallel. These files/tiles (not xyz / tms) I want to bring in a form which can be read by e.g. Maplibre (and this is xyz tiles). So I guess the way to go is to create a VRT pointing to the files first and use this as input for tiling.

I'm still keen to try this ou. Do you want to render an existing raster data via a colour map to PNG tiles? Or do you want a local copy of a large colour image from somewhere? An example would help me focus on a starting point, I was thinking to render out esri tiles for a country or something like that but if you have a more specific case that would help. I would use gdalraster to do it so it's not entirely OT and it should be easy to check by comparison with gdal2tiles or gdal_retile.

It's more the first case where I have a bunch of locally created GeoTIFF files which I want to convert to PNG/JPEG/WEBP XYZ-tiles. Currently they would be rendered via colour map to RGB image tiles (lets say apply viridis color map to elevation). However, in the future I hope the colour scale can be mapped on the client side later (provide raster data with whatever values and vizualization can be applied in the app later). One case which adds another step apart from tiling is to generate high resolution elevation data which can be used for 3D terrain in webmaps. Again it has to be available as XYZ-tiles, but additionally RGB-encoded.

ctoney commented 4 months ago

@wiesehahn, thanks for that description. Sorry, I was slow to understand the whole workflow but I think I'm getting there now. Your hint of possibly using VRT wasn't what I was picturing at first but that was wrong interpretation on my part. Makes sense now. This does sound like a subset of gdal2tiles.py and not quite the project to port I was imagining at first, as @mdsumner understood from the beginning. Still a fair amount of code to implement this in R I would think.

Is this closer? In terms of gdal2tiles, your <input_file> might be a virtual mosaic, since the source data will typically come as a set of tiles of some sort. Your inputs generally are not Byte as gdal2tiles expects, but a color map will be applied somewhere along the way. Output will be PNG in directories of tiles following XYZ Slippy Map standard, like /zoom/x/y.png, over the range of zoom levels you need (https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames).

mdsumner commented 4 months ago

we can set a palette on byte-scaled data like this:

coltab <- cbind(0:255, t(col2rgb(hcl.colors(256))))
ds$setColorTable(1, coltab, "RGB")

what I can't figure out yet is how to "-expand" that to full RGB from gdalraster, I don't think we can without the equivalent of translate <dataset> expand.vrt -expand at the cli, which I think is not possible in gdalraster atm, we can't modify the dataset to have a palette, and then expand that, note that it is possible via VRT:

https://gist.github.com/mdsumner/71609d49c5cd98dafa3baffac2d4e5e9

(there was a bug in gdalattachpct when I wrote that). I'll keep exploring. At worst we can chain a few cli calls on a source to generate VRT that we can then tile up with gdalraster - we could even hack the VRT text to do it, scaling is just a matter of fields on the virtual bands e.g. this VRT chunk below scales Int16 -> Byte by running with (needs GDAL 3.7 for this syntax)

"vrt:///vsicurl/https://gebco2023.s3.valeria.science/gebco_2023_land_cog.tif?ovr=4&scale=true&ot=Byte"
<VRTRasterBand dataType="Byte" band="1" blockXSize="512" blockYSize="512">
    <ColorInterp>Gray</ColorInterp>
    <ComplexSource>
      <SourceFilename relativeToVRT="0">/vsicurl/https://gebco2023.s3.valeria.science/gebco_2023_land_cog.tif</SourceFilename>
      <OpenOptions>
        <OOI key="OVERVIEW_LEVEL">4 only</OOI>
      </OpenOptions>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="2700" RasterYSize="1350" DataType="Int16" BlockXSize="512" BlockYSize="512" />
      <SrcRect xOff="0" yOff="0" xSize="2700" ySize="1350" />
      <DstRect xOff="0" yOff="0" xSize="2700" ySize="1350" />
      <ScaleOffset>162.004</ScaleOffset>
      <ScaleRatio>0.0171579</ScaleRatio>
    </ComplexSource>
  </VRTRasterBand>
ctoney commented 4 months ago

the equivalent of translate <dataset> expand.vrt -expand at the cli, which I think is not possible in gdalraster atm

We do have gdalraster::translate(), do you mean a separate issue?

I have code in rasterToVRT() (in R/gdalraster_proc.R) that edits the XML. If we end up needing to edit the VRT directly, I can take a look if that would help at all.

mdsumner commented 4 months ago

You can't modify an actual dataset in memory and create a new one with it, translate is filename to filename, (dsn to dsn actually) same as warp I have to engineer VRT upfront for any changes. (We would need #306 or equivalent for translate).

I guess we can copy the source, then open in write mode, though. The feature would be translate or CreateCopy taking a GDALRaster and creating a new one. Hey maybe we can create VRT then open that in write mode and do these mods, then translate with -expand to a new vrt ...

It's very instructive to me chasing this down. 👌 It's strictly out of scope regarding doing what gdal2tiles can do but I think it's worth pursuing and sharing ideas

mdsumner commented 4 months ago

This works, it saves us from running any actual system calls, though we have to have a couple of .vrt files that's no big deal there's no large intermediate files. The result is full RGB 3 Byte bands, that we can tile out.

gebco <- "/vsicurl/https://gebco2023.s3.valeria.science/gebco_2023_land_cog.tif"

translate(gebco, "scaled.vrt", cl_arg = c("-ovr", "5", "-scale", "-ot", "Byte"))
ds <- new(GDALRaster, "scaled.vrt", read_only = FALSE)
## set a palette (we might map value prior to scaling here)
coltab <- cbind(0:255, t(col2rgb(hcl.colors(256))))
ds$setColorTable(1, coltab, "RGB")
ds$close()  ## this writes the colour table to the scaled.vrt

## now expand that to RGB
translate("scaled.vrt", "expand.vrt", cl_arg = c("-expand", "rgb"))
ds <- new(GDALRaster, "expand.vrt")
plot_raster(ds, nbands = 3)

It will be fun to do things like tap into the byte-scaling to do our own colour ramps.

image

ctoney commented 4 months ago

The color table could also be written to the VRT with ds$flushCache(). The code below shows that and checks the display of both VRTs. With flushCache():

Using this method does not preclude calling $close() to properly close the dataset and ensure that important data not addressed by $flushCache() is written in the file (see also $open() above).

The code also shows that a dataset opened read-only can be a source raster for translate() or other processing. In that case, translate() internally uses the existing dataset handle rather than creating a new one. This should avoid overhead from opening a new dataset, and any existing data in the block cache could potentially be used. But as noted for update:

If a dataset object is opened with update access (read_only = FALSE), it is not recommended to open a new dataset on the same underlying filename.

Code ```r library(gdalraster) gebco <- "/vsicurl/https://gebco2023.s3.valeria.science/gebco_2023_land_cog.tif" translate(gebco, "scaled.vrt", cl_arg = c("-ovr", "5", "-scale", "-ot", "Byte")) ds <- new(GDALRaster, "scaled.vrt", read_only = FALSE) ## set a palette (we might map value prior to scaling here) coltab <- cbind(0:255, t(col2rgb(hcl.colors(256)))) ds$setColorTable(1, coltab, "RGB") ds$flushCache() ## this writes the colour table to the scaled.vrt plot_raster(ds) ## getFilename() can be called with the dataset open or closed ## not necessary to use it but may be convenient in some cases translate(ds$getFilename(), "expand.vrt", cl_arg = c("-expand", "rgb")) ds_exp <- new(GDALRaster, "expand.vrt") plot_raster(ds_exp, nbands = 3) ds$close() vsi_unlink("scaled.vrt") ds_exp$close() vsi_unlink("expand.vrt") ```
mdsumner commented 4 months ago

excellent, that's good to know I'm always a bit vague about when a dataset flushes to disk and how.

Here's a package to create a tile dir with gdal_tiles() function. It's still very raw, lots of stuff to worry about but I'm happy with the overall logic (using my tile logic in {grout} and not GDAL block logic, tile dangles are already handled in grout and I'm not sure how to do that when you are reading from something that's not in the crs of the target tiles...)

https://github.com/hypertidy/filearchy

Please use with caution, there's not good handling of 'overwrite' and none for 'update' yet. I just haven't tested carefully what it might clobber, but happily my intuition about ease of creating tiles and write speed in parallelization has been ok. It's PNG only for now, lots of nice features to add still.

ctoney commented 2 months ago

https://github.com/hypertidy/filearchy/issues/1 seems to resolve