ajnisbet / opentopodata

Open alternative to the Google Elevation API!
https://www.opentopodata.org
MIT License
314 stars 68 forks source link

Using the emod2018 dataset, opentopodata.org's API returns invalid JSON #13

Closed l3u closed 3 years ago

l3u commented 3 years ago

Hi :-)

Whilst working on a geotagging program using opentopodata.org's API, I noticed some error when parsing the server's answer to a QJsonDocument. It yielded QJsonParseError::IllegalNumber. After having had a closer look, this happened when using the emod2018 dataset.

Example: Requesting https://api.opentopodata.org/v1/emod2018?locations=50,11 returns

{
"results": [
    {
    "elevation": NaN, 
    "location": {
        "lat": 50.0, 
        "lng": 11.0
    }
    }
], 
"status": "OK"
}

The elevation key's value is NaN (instead of null, as the other datasets return when a coordinate is out of bounds). This is, per RFC 4627, not permitted (cf. https://tools.ietf.org/html/rfc4627#section-2.4):

Numeric values that cannot be represented as sequences of digits
(such as Infinity and NaN) are not permitted.

Most probably, this is what QJsonDocument::fromJson complains about.

Anyway, thanks about this great piece of software and the great service you provide :-)

ajnisbet commented 3 years ago

Thanks for raising this! Python supports NaN in json, I didn't realise it isn't in the spec!

I would like to remove NaNs from the api if it's breaking clients

The main issue is that I currently use NaN and null differently in Open Topo Data: NaN means there is a null/NODATA value in the underlying dataset at that location, null means the dataset doesn't cover that location.

There's a few options I can think of

  1. Just set NaN -> null and lose the distinction of NODATA vs out of bounds.
  2. Return a string "NaN" instead, though I don't love the mixed types.
  3. Add another property like "is_nodata": true to the result.
  4. Return a 400 error if location is out of bounds.
  5. Add a query parameter nodata_value, which defaults to NaN, but can be set to null or an integer like -9999. Preserves compatibility, but allows standards-compliant clients to override. Change default to null in v2.

I'll give it some thought and see what clients other APIs are doing around NaN and NODATA. Let me know if you have any ideas about a fix.

l3u commented 3 years ago

Hey, thanks for your quick reply :-)

I personally would be totally fine with returning null if the requested location can't be determinded from the underlying dataset. I don't think one has really to differentiate if there's a "no data" value in the dataset or the dataset doesn't cover the location, as in both cases, one won't get the elevation. Imo, "no data" and "out of bounds" are essentially the very same ;-) Maybe, you want to add some "exists in dataset" key, but I see no real benefit here.

I first thought about checking the server's response for NaN and replacing it with null before parsing it, but after having checked the RFC, I thought "Well, Qt behaves exactly correct here, it's a server bug". Python doesn't follow the standard here … so imo, no matter what you do,NaN should definitely not be returned. Strictly following a standard is never a bad idea, esp. in this case, where – in the present case – the server is Python, but the client is C++/Qt or whatever.

So if you ask me how to address this:

  1. Way to go – fixes the non-standard behavior without a real drawback :-)
  2. One would have to check if elevation is a string or a number. this sucks. I would leave it always to be a number.
  3. If you like to do so, maybe. But as said, imo no data is no data – for whatever reason.
  4. I wouldn't do this. The request is perfectly okay, and so should be the answer. It only doesn't yield a result due to the underlying dataset lacking the location. But nothing went wrong!
  5. This would also require clients to change their code. Then, speaking of my use-case with C++/Qt (btw. I'm speaking of KGeoTag), I can as well check for NaN to be present in the server's answer, cache the result, replace it with null and parse the JSON afterwards. So, from my point of view, no benefit, but persisting non-standard behavior.

I hope I could help you a bit.

ajnisbet commented 3 years ago

Thanks for your patience and thoughtful feedback! This should now be fixed on master and on api.opentopodata.com by https://github.com/ajnisbet/opentopodata/commit/362045478d48f3f45e3d94afe681f2bb450cc9b1: null is now returned for NODATA values.

Having invalid json that breaks clients by default is a bug so I agree that null should be the default.

But there is a semantic difference between NODATA and out of bounds, and other tools like rasterio and gdal have different behaviour for NODATA and querying outside bounds. So anyone relying on the old behaviour can set nodata_value=nan. It's a query parameter rather than a config option as NaN support in json is client-dependent.

Going forward, if people want to handle NODATA differently they should instead use a numerical value for the nodata_value parameter, which is a hacky but standard approach in e.g., gdal that preserves json validity.

l3u commented 3 years ago

Nice! I think this is a solution everybody can live with :-)

You could e. g. set nodata_value=-12000 or nodata_value=8900, which is less than the depth of the Marianna Trench or more than the height of Mount Everest respectively and thus can't be a real value.