ajnisbet / opentopodata

Open alternative to the Google Elevation API!
https://www.opentopodata.org
MIT License
314 stars 68 forks source link

Bounds error for tiled datasets without overlap #24

Closed janusw closed 3 years ago

janusw commented 3 years ago

With my own server (running the current version 1.4.1 of opentopodata), I see the following problem regarding the EU-DEM dataset:

curl "https://myserver/v1/eudem25m?locations=50.100,8.387"
{
  "results": [
    {
      "elevation": 362.4136962890625, 
      "location": {
        "lat": 50.1, 
        "lng": 8.387
      }
    }
  ], 
  "status": "OK"
}

curl "https://myserver/v1/eudem25m?locations=50.101,8.387"
{
  "error": "Location '50.101,8.387' has latitude outside of raster bounds", 
  "status": "INVALID_REQUEST"
}

curl "https://myserver/v1/eudem25m?locations=50.102,8.387"
{
  "results": [
    {
      "elevation": 362.61474609375, 
      "location": {
        "lat": 50.102, 
        "lng": 8.387
      }
    }
  ], 
  "status": "OK"
}

For some reason, the location 50.101,8.387 seems to trigger this bogus error about the latitude being outside of the raster bounds, although it's clearly inside of the EU-DEM bounds (and for the two neighboring locations everything seems to be fine).

The error occurs every time I query this location on my own server, but interestingly it does not occur with the public server:

curl "https://api.opentopodata.org/v1/eudem25m?locations=50.101,8.387"             
{
  "results": [
    {
      "elevation": 360.98956298828125, 
      "location": {
        "lat": 50.101, 
        "lng": 8.387
      }
    }
  ], 
  "status": "OK"
}

I assume the difference might be due to a slightly different data setup?

And then, independent of this specific problem, I noticed another small issue that occurs when querying two locations at once:

curl "https://myserver/v1/eudem25m?locations=50.101,8.387|50.102,8.387"  
{
  "error": "Location '50.102,8.387' has latitude outside of raster bounds", 
  "status": "INVALID_REQUEST"
}

As shown above, the latitude 50.101 is the problematic one, but here the error message mentions the other one (50.102). This might be an off-by-one issue in the error diagnostics.

janusw commented 3 years ago

In fact I followed the instructions at https://www.opentopodata.org/datasets/eudem/ rather closely when setting up the EU-DEM data. My data folder only contains the 27 original .tif files (unmodified in content, but renamed), and my config.yaml snippet is identical to what is documented in the instructions.

Also I checked that the docker logs don't contain any additional error messages, and verified that my server gives reasonable result for the EU-DEM data.

Therefore I have two questions (@ajnisbet):

janusw commented 3 years ago

One more thing I checked: The elevation result for a 'good' point on my own server is completely identical to the one reported by the public server:

curl "https://myserver/v1/eudem25m?locations=50.102,8.387"           
{
  "results": [
    {
      "elevation": 362.61474609375, 
      "location": {
        "lat": 50.102, 
        "lng": 8.387
      }
    }
  ], 
  "status": "OK"
}

curl "https://api.opentopodata.org/v1/eudem25m?locations=50.102,8.387"
{
  "results": [
    {
      "elevation": 362.61474609375, 
      "location": {
        "lat": 50.102, 
        "lng": 8.387
      }
    }
  ], 
  "status": "OK"
}
janusw commented 3 years ago

In case it matters:

gdalsrsinfo N3000000E4000000.tif

PROJ.4 : '+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs '

OGC WKT :
PROJCS["ETRS89_ETRS_LAEA",
    GEOGCS["ETRS89",
        DATUM["European_Terrestrial_Reference_System_1989",
            SPHEROID["GRS 1980",6378137,298.2572221010042,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6258"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433],
        AUTHORITY["EPSG","4258"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",52],
    PARAMETER["longitude_of_center",10],
    PARAMETER["false_easting",4321000],
    PARAMETER["false_northing",3210000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]

This is the tile that the 'bad' location lies on, and as mentioned the .tif file is the original one (as downloaded from the Copernicus website). In fact the point is quite close to the boundary of the tile, but clearly still on the tile (according to the map on the Copernicus website).

janusw commented 3 years ago

Therefore I have two questions (@ajnisbet):

* Is the server at `api.opentopodata.org` already running the new version 1.4.1, or still the older 1.4.0?

I just checked version 1.4.0, and in fact I see the same errors there. Seems like 1.4.0 and 1.4.1 behave the same way in this respect.

Is there any way to find out the server version from the client side?

Maybe this could be added to the /health API?

* Did you apply any further transformations or optimizations to the EU-DEM data, beyond what is documented at https://www.opentopodata.org/datasets/eudem/?

I hope that this might explain the differences I'm seeing ...

janusw commented 3 years ago

Did some debugging:

diff --git a/opentopodata/backend.py b/opentopodata/backend.py
index dc85180..abf8d47 100644
--- a/opentopodata/backend.py
+++ b/opentopodata/backend.py
@@ -73,7 +73,7 @@ def _validate_points_lie_within_raster(xs, ys, lats, lons, bounds, res):
         i_oob = np.argmax(y_in_bounds)
         lat = lats[i_oob]
         lon = lons[i_oob]
-        msg = "Location '{},{}' has latitude outside of raster bounds".format(lat, lon)
+        msg = "Location '{},{}' has latitude outside of raster bounds: {} {} {} {} {} {}".format(lat, lon, xs[0], ys[0], x_min, x_max, y_min, y_max)
         raise InputError(msg)
     if not all(x_in_bounds):
         i_oob = np.argmax(x_in_bounds)

This results in:

Location '50.101,8.387' has latitude outside of raster bounds: 4205596.2173978295 3000012.1695728954 4000012.49999999 4999987.50000001 3000012.49999999 3999987.50000001

So: The boundaries are those of the tile E40N30 (as expected). Everything is fine in x direction. ys[0] is just below y_min, but obviously still above the nominal boundary of 3000000.

Seems like this is due to the half-pixel adjustment that is mentioned in the comments in _validate_points_lie_within_raster.

janusw commented 3 years ago

Seems like this is due to the half-pixel adjustment that is mentioned in the comments in _validate_points_lie_within_raster.

Looking at the code, this appears to be a limitation (read: bug) that concerns all multi-file data. There seems to be no proper handling of points close to the tile boundaries.

Since I still don't know why the error does not occur for api.opentopodata.org, I simply take a guess and assume that one might be able to overcome this by using a VRT file (as mentioned at https://www.opentopodata.org/server/) ...

janusw commented 3 years ago

Since I still don't know why the error does not occur for api.opentopodata.org, I simply take a guess and assume that one might be able to overcome this by using a VRT file (as mentioned at https://www.opentopodata.org/server/) ...

Just verified that assumption: Built a VRT for EUDEM via ...

opentopodata/data/eudem-vrt$ gdalbuildvrt eudem.vrt ../eudem/*.tif

... and used this folder in the config.yaml. And voila, the errors are gone immediately! yay :D

Btw, using the argument -co VRT_SHARED_SOURCE=0 with gdalbuildvrt (as mentioned on https://www.opentopodata.org/datasets/emod2018/) gives an error for me:

ERROR 6: Unknown option name '-co'
Usage: gdalbuildvrt [-tileindex field_name]
                    [-resolution {highest|lowest|average|user}]
                    [-te xmin ymin xmax ymax] [-tr xres yres] [-tap]
                    [-separate] [-b band] [-sd subdataset]
                    [-allow_projection_difference] [-q]
                    [-addalpha] [-hidenodata]
                    [-srcnodata "value [value...]"] [-vrtnodata "value [value...]"]
                    [-a_srs srs_def]
                    [-r {nearest,bilinear,cubic,cubicspline,lanczos,average,mode}]
                    [-oo NAME=VALUE]*
                    [-input_file_list my_list.txt] [-overwrite] output.vrt [gdalfile]*

This is gdal version 2.2.3.

ajnisbet commented 3 years ago

Ah yeah, this is an issue with writing the download instructions after I had downloaded the data!

Most tiled elevation datasets have an overlap: a 1 pixel overlap is enough for linear interpolation, and a few datasets come with larger overlaps for more flexibility with interpolation and resampling.

EU-DEM doesn't have an overlap, so yeah I added one when downloading the rasters. I've merged your PR for now as a vrt is probably the easiest way to handle EU-DEM. I'll also add instructions for adding an overlap buffer.

It would also be possible for Open Topo Data to handle requests in between two non-overlapping rasters (by reading from both rasters just like gdal does for vrt files). There's a couple of ways to handle this:

janusw commented 3 years ago

EU-DEM doesn't have an overlap, so yeah I added one when downloading the rasters. I've merged your PR for now as a vrt is probably the easiest way to handle EU-DEM.

Thanks!

I'll also add instructions for adding an overlap buffer.

Yeah, that might be useful.

It would also be possible for Open Topo Data to handle requests in between two non-overlapping rasters (by reading from both rasters just like gdal does for vrt files). There's a couple of ways to handle this:

  • Reading from both files and merging. This would mean doing interpolation in opentopodata rather than using rasterio, which has performance impacts as well as being a lot of work.
  • I could detect when a point falls between two rasters and build an in-memory vrt just for that query.

Not necessary from my point of view. Both methods sound like they involve a significant amount of work and are probably more error-prone than just using a VRT file.

Unless you see any major advantages over the VRT, I'm perfectly happy with that :)

janusw commented 3 years ago

It would also be possible for Open Topo Data to handle requests in between two non-overlapping rasters (by reading from both rasters just like gdal does for vrt files). There's a couple of ways to handle this:

  • Reading from both files and merging. This would mean doing interpolation in opentopodata rather than using rasterio, which has performance impacts as well as being a lot of work.
  • I could detect when a point falls between two rasters and build an in-memory vrt just for that query.

Not necessary from my point of view. Both methods sound like they involve a significant amount of work and are probably more error-prone than just using a VRT file.

Unless you see any major advantages over the VRT, I'm perfectly happy with that :)

In fact one possible advantage might be performance ...

I just noticed that the VRT approach can become quite slow for the BKG-10m dataset, which has a lot of tiles (1025). For single-point requests the difference is barely noticable, but when requesting many locations at once the VRT slowdown starts to show up: With a tiled-tif setup I can query 400 points below a second, while the VRT setup takes 6-7 seconds for the same request, which is really quite bad (if you subtract the network latency, the slowdown is easily a factor of ten for that case).

janusw commented 3 years ago

Btw, one other thing I tried (as an alternative to the VRT) was to merge all the 1025 tif files into a single one via:

gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif

Like the VRT, this also resolves the out-of-bounds errors reported in this issue. There is one advantage and one drawback, though:

@ajnisbet It would be great if you could share your instructions for adding an overlap to the tiles. I guess this is the best solution after all, which is probably why you chose it in the first place ;)

ajnisbet commented 3 years ago

I made the default instructions for EU-DEM to build a vrt file, as it's the most simple and the performance should be acceptable for only 27 rasters. I added a link to the more complicated buffering code in https://github.com/ajnisbet/opentopodata/commit/80e9f4fa6d6c71eddae37eb959b0f6ac615deb5e , hopefully that helps with your dataset.

There's still the issue of whether to throw an exception for locations outside the raster, I think null would be more appropriate, but can track that in #26.

I think for now I won't do anything in Open Topo Data to help with non-overlapping tiles other than preprocessing instructions, but I'm open to fixing this in the future.

And finally for your command gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif, gdal creates uncompressed geotiffs by default which would explain the 4x size increase. If you add -co COMPRESS=DEFLATE it should bring the size down.

janusw commented 3 years ago

I made the default instructions for EU-DEM to build a vrt file, as it's the most simple and the performance should be acceptable for only 27 rasters. I added a link to the more complicated buffering code in 80e9f4f , hopefully that helps with your dataset.

Great, thanks a lot!

There's still the issue of whether to throw an exception for locations outside the raster, I think null would be more appropriate, but can track that in #26.

I agree, returning a null value is definitely more reasonable. I'd like to understand why the exception is thrown in my case.

I think for now I won't do anything in Open Topo Data to help with non-overlapping tiles other than preprocessing instructions, but I'm open to fixing this in the future.

That's fine with me (as mentioned before). Having proper preprocessing instructions is completely sufficient IMHO.

And finally for your command gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif, gdal creates uncompressed geotiffs by default which would explain the 4x size increase. If you add -co COMPRESS=DEFLATE it should bring the size down.

Ah, good point. Will try that soon.

If one can get rid of the size penalty, this might be a good alternative to the buffering solution mentioned above. It is a bit simpler at least.

janusw commented 3 years ago

And finally for your command gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif, gdal creates uncompressed geotiffs by default which would explain the 4x size increase. If you add -co COMPRESS=DEFLATE it should bring the size down.

Ah, good point. Will try that soon.

If one can get rid of the size penalty, this might be a good alternative to the buffering solution mentioned above. It is a bit simpler at least.

So, I finally got around to trying this (sorry that it took me so long), and created one large compressed TIF file from the BKG dataset (10m grid):

gdal_merge.py -o bkg-dgm10.tif ../bkg-dgm10/*.tif -co COMPRESS=DEFLATE -co BIGTIFF=YES -co NUM_THREADS=ALL_CPUS

That took around 7 min and produced a file of 8.1G. That's completely fine (given that the size of the source files was 6.4G) and much better than the uncompressed file with 22G.

But unfortunately the performance for querying many points at once also got a bit worse. I tried this with a query of slightly more than 500 points, and the compressed single TIF was almost three times slower that the uncompressed one, but still more than two times faster than the VRT (that is with opentopodata version 1.5.0).

As noted above, the difference is barely noticable for single-point requests.

janusw commented 2 years ago

Late comment to this thread: By default, the gdal_merge script does not handle the NODATA values correctly for the bkg10m dataset. The arguments -n -9999 -a_nodata -9999 are required to achieve this.