Closed janusw closed 3 years ago
In fact I followed the instructions at https://www.opentopodata.org/datasets/eudem/ rather closely when setting up the EU-DEM data. My data folder only contains the 27 original .tif files (unmodified in content, but renamed), and my config.yaml snippet is identical to what is documented in the instructions.
Also I checked that the docker logs don't contain any additional error messages, and verified that my server gives reasonable result for the EU-DEM data.
Therefore I have two questions (@ajnisbet):
api.opentopodata.org
already running the new version 1.4.1, or still the older 1.4.0? Is there any way to find out the server version from the client side?One more thing I checked: The elevation result for a 'good' point on my own server is completely identical to the one reported by the public server:
curl "https://myserver/v1/eudem25m?locations=50.102,8.387"
{
"results": [
{
"elevation": 362.61474609375,
"location": {
"lat": 50.102,
"lng": 8.387
}
}
],
"status": "OK"
}
curl "https://api.opentopodata.org/v1/eudem25m?locations=50.102,8.387"
{
"results": [
{
"elevation": 362.61474609375,
"location": {
"lat": 50.102,
"lng": 8.387
}
}
],
"status": "OK"
}
In case it matters:
gdalsrsinfo N3000000E4000000.tif
PROJ.4 : '+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs '
OGC WKT :
PROJCS["ETRS89_ETRS_LAEA",
GEOGCS["ETRS89",
DATUM["European_Terrestrial_Reference_System_1989",
SPHEROID["GRS 1980",6378137,298.2572221010042,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6258"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4258"]],
PROJECTION["Lambert_Azimuthal_Equal_Area"],
PARAMETER["latitude_of_center",52],
PARAMETER["longitude_of_center",10],
PARAMETER["false_easting",4321000],
PARAMETER["false_northing",3210000],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]]]
This is the tile that the 'bad' location lies on, and as mentioned the .tif file is the original one (as downloaded from the Copernicus website). In fact the point is quite close to the boundary of the tile, but clearly still on the tile (according to the map on the Copernicus website).
Therefore I have two questions (@ajnisbet):
* Is the server at `api.opentopodata.org` already running the new version 1.4.1, or still the older 1.4.0?
I just checked version 1.4.0, and in fact I see the same errors there. Seems like 1.4.0 and 1.4.1 behave the same way in this respect.
Is there any way to find out the server version from the client side?
Maybe this could be added to the /health API?
* Did you apply any further transformations or optimizations to the EU-DEM data, beyond what is documented at https://www.opentopodata.org/datasets/eudem/?
I hope that this might explain the differences I'm seeing ...
Did some debugging:
diff --git a/opentopodata/backend.py b/opentopodata/backend.py
index dc85180..abf8d47 100644
--- a/opentopodata/backend.py
+++ b/opentopodata/backend.py
@@ -73,7 +73,7 @@ def _validate_points_lie_within_raster(xs, ys, lats, lons, bounds, res):
i_oob = np.argmax(y_in_bounds)
lat = lats[i_oob]
lon = lons[i_oob]
- msg = "Location '{},{}' has latitude outside of raster bounds".format(lat, lon)
+ msg = "Location '{},{}' has latitude outside of raster bounds: {} {} {} {} {} {}".format(lat, lon, xs[0], ys[0], x_min, x_max, y_min, y_max)
raise InputError(msg)
if not all(x_in_bounds):
i_oob = np.argmax(x_in_bounds)
This results in:
Location '50.101,8.387' has latitude outside of raster bounds: 4205596.2173978295 3000012.1695728954 4000012.49999999 4999987.50000001 3000012.49999999 3999987.50000001
So: The boundaries are those of the tile E40N30 (as expected). Everything is fine in x direction. ys[0] is just below y_min, but obviously still above the nominal boundary of 3000000.
Seems like this is due to the half-pixel adjustment that is mentioned in the comments in _validate_points_lie_within_raster
.
Seems like this is due to the half-pixel adjustment that is mentioned in the comments in
_validate_points_lie_within_raster
.
Looking at the code, this appears to be a limitation (read: bug) that concerns all multi-file data. There seems to be no proper handling of points close to the tile boundaries.
Since I still don't know why the error does not occur for api.opentopodata.org, I simply take a guess and assume that one might be able to overcome this by using a VRT file (as mentioned at https://www.opentopodata.org/server/) ...
Since I still don't know why the error does not occur for api.opentopodata.org, I simply take a guess and assume that one might be able to overcome this by using a VRT file (as mentioned at https://www.opentopodata.org/server/) ...
Just verified that assumption: Built a VRT for EUDEM via ...
opentopodata/data/eudem-vrt$ gdalbuildvrt eudem.vrt ../eudem/*.tif
... and used this folder in the config.yaml. And voila, the errors are gone immediately! yay :D
Btw, using the argument -co VRT_SHARED_SOURCE=0
with gdalbuildvrt
(as mentioned on https://www.opentopodata.org/datasets/emod2018/) gives an error for me:
ERROR 6: Unknown option name '-co'
Usage: gdalbuildvrt [-tileindex field_name]
[-resolution {highest|lowest|average|user}]
[-te xmin ymin xmax ymax] [-tr xres yres] [-tap]
[-separate] [-b band] [-sd subdataset]
[-allow_projection_difference] [-q]
[-addalpha] [-hidenodata]
[-srcnodata "value [value...]"] [-vrtnodata "value [value...]"]
[-a_srs srs_def]
[-r {nearest,bilinear,cubic,cubicspline,lanczos,average,mode}]
[-oo NAME=VALUE]*
[-input_file_list my_list.txt] [-overwrite] output.vrt [gdalfile]*
This is gdal version 2.2.3.
Ah yeah, this is an issue with writing the download instructions after I had downloaded the data!
Most tiled elevation datasets have an overlap: a 1 pixel overlap is enough for linear interpolation, and a few datasets come with larger overlaps for more flexibility with interpolation and resampling.
EU-DEM doesn't have an overlap, so yeah I added one when downloading the rasters. I've merged your PR for now as a vrt is probably the easiest way to handle EU-DEM. I'll also add instructions for adding an overlap buffer.
It would also be possible for Open Topo Data to handle requests in between two non-overlapping rasters (by reading from both rasters just like gdal does for vrt files). There's a couple of ways to handle this:
EU-DEM doesn't have an overlap, so yeah I added one when downloading the rasters. I've merged your PR for now as a vrt is probably the easiest way to handle EU-DEM.
Thanks!
I'll also add instructions for adding an overlap buffer.
Yeah, that might be useful.
It would also be possible for Open Topo Data to handle requests in between two non-overlapping rasters (by reading from both rasters just like gdal does for vrt files). There's a couple of ways to handle this:
- Reading from both files and merging. This would mean doing interpolation in opentopodata rather than using rasterio, which has performance impacts as well as being a lot of work.
- I could detect when a point falls between two rasters and build an in-memory vrt just for that query.
Not necessary from my point of view. Both methods sound like they involve a significant amount of work and are probably more error-prone than just using a VRT file.
Unless you see any major advantages over the VRT, I'm perfectly happy with that :)
It would also be possible for Open Topo Data to handle requests in between two non-overlapping rasters (by reading from both rasters just like gdal does for vrt files). There's a couple of ways to handle this:
- Reading from both files and merging. This would mean doing interpolation in opentopodata rather than using rasterio, which has performance impacts as well as being a lot of work.
- I could detect when a point falls between two rasters and build an in-memory vrt just for that query.
Not necessary from my point of view. Both methods sound like they involve a significant amount of work and are probably more error-prone than just using a VRT file.
Unless you see any major advantages over the VRT, I'm perfectly happy with that :)
In fact one possible advantage might be performance ...
I just noticed that the VRT approach can become quite slow for the BKG-10m dataset, which has a lot of tiles (1025). For single-point requests the difference is barely noticable, but when requesting many locations at once the VRT slowdown starts to show up: With a tiled-tif setup I can query 400 points below a second, while the VRT setup takes 6-7 seconds for the same request, which is really quite bad (if you subtract the network latency, the slowdown is easily a factor of ten for that case).
Btw, one other thing I tried (as an alternative to the VRT) was to merge all the 1025 tif files into a single one via:
gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif
Like the VRT, this also resolves the out-of-bounds errors reported in this issue. There is one advantage and one drawback, though:
@ajnisbet It would be great if you could share your instructions for adding an overlap to the tiles. I guess this is the best solution after all, which is probably why you chose it in the first place ;)
I made the default instructions for EU-DEM to build a vrt file, as it's the most simple and the performance should be acceptable for only 27 rasters. I added a link to the more complicated buffering code in https://github.com/ajnisbet/opentopodata/commit/80e9f4fa6d6c71eddae37eb959b0f6ac615deb5e , hopefully that helps with your dataset.
There's still the issue of whether to throw an exception for locations outside the raster, I think null
would be more appropriate, but can track that in #26.
I think for now I won't do anything in Open Topo Data to help with non-overlapping tiles other than preprocessing instructions, but I'm open to fixing this in the future.
And finally for your command gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif
, gdal creates uncompressed geotiffs by default which would explain the 4x size increase. If you add -co COMPRESS=DEFLATE
it should bring the size down.
I made the default instructions for EU-DEM to build a vrt file, as it's the most simple and the performance should be acceptable for only 27 rasters. I added a link to the more complicated buffering code in 80e9f4f , hopefully that helps with your dataset.
Great, thanks a lot!
There's still the issue of whether to throw an exception for locations outside the raster, I think
null
would be more appropriate, but can track that in #26.
I agree, returning a null
value is definitely more reasonable. I'd like to understand why the exception is thrown in my case.
I think for now I won't do anything in Open Topo Data to help with non-overlapping tiles other than preprocessing instructions, but I'm open to fixing this in the future.
That's fine with me (as mentioned before). Having proper preprocessing instructions is completely sufficient IMHO.
And finally for your command
gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif
, gdal creates uncompressed geotiffs by default which would explain the 4x size increase. If you add-co COMPRESS=DEFLATE
it should bring the size down.
Ah, good point. Will try that soon.
If one can get rid of the size penalty, this might be a good alternative to the buffering solution mentioned above. It is a bit simpler at least.
And finally for your command
gdal_merge.py -o bkg-dgm10.tif bkg-dgm10/*.tif
, gdal creates uncompressed geotiffs by default which would explain the 4x size increase. If you add-co COMPRESS=DEFLATE
it should bring the size down.Ah, good point. Will try that soon.
If one can get rid of the size penalty, this might be a good alternative to the buffering solution mentioned above. It is a bit simpler at least.
So, I finally got around to trying this (sorry that it took me so long), and created one large compressed TIF file from the BKG dataset (10m grid):
gdal_merge.py -o bkg-dgm10.tif ../bkg-dgm10/*.tif -co COMPRESS=DEFLATE -co BIGTIFF=YES -co NUM_THREADS=ALL_CPUS
That took around 7 min and produced a file of 8.1G. That's completely fine (given that the size of the source files was 6.4G) and much better than the uncompressed file with 22G.
But unfortunately the performance for querying many points at once also got a bit worse. I tried this with a query of slightly more than 500 points, and the compressed single TIF was almost three times slower that the uncompressed one, but still more than two times faster than the VRT (that is with opentopodata version 1.5.0).
As noted above, the difference is barely noticable for single-point requests.
Late comment to this thread: By default, the gdal_merge
script does not handle the NODATA values correctly for the bkg10m dataset. The arguments -n -9999 -a_nodata -9999
are required to achieve this.
With my own server (running the current version 1.4.1 of opentopodata), I see the following problem regarding the EU-DEM dataset:
For some reason, the location
50.101,8.387
seems to trigger this bogus error about the latitude being outside of the raster bounds, although it's clearly inside of the EU-DEM bounds (and for the two neighboring locations everything seems to be fine).The error occurs every time I query this location on my own server, but interestingly it does not occur with the public server:
I assume the difference might be due to a slightly different data setup?
And then, independent of this specific problem, I noticed another small issue that occurs when querying two locations at once:
As shown above, the latitude
50.101
is the problematic one, but here the error message mentions the other one (50.102
). This might be an off-by-one issue in the error diagnostics.