GIScience / ohsome-api

API for analysing OpenStreetMap history data
https://api.ohsome.org
GNU Affero General Public License v3.0
45 stars 7 forks source link

Differing result for /count and /count/groupBy/boundary for single boundary polygon #1

Open rtroilo opened 4 years ago

rtroilo commented 4 years ago

count/groupBy/boundary request with two overlapping boundaries with a value of 308 for the first ( "groupByObject" : "way/587699645") boundary.

curl --data-urlencode "bpolys@bug_both.txt" -d "time=2019-12-10" -d "types=NODE,WAY" https://api.ohsome.org/v1/elements/count/groupBy/boundary    
{
  …
  "groupByResult" : [ {
    "result" : [ {
      "timestamp" : "2019-12-10T00:00:00Z",
      "value" : 308.0
    } ],
    "groupByObject" : "way/587699645"
  }, {
    "result" : [ {
      "timestamp" : "2019-12-10T00:00:00Z",
      "value" : 169.0
    } ],
    "groupByObject" : "relation/2629047"
  } ]
}

count/groupBy/boundary request with just one boundary, we get a value of 300 now.

 curl --data-urlencode "bpolys@bug_1.txt" -d "time=2019-12-10" -d "types=NODE,WAY" https://api.ohsome.org/v1/elements/count/groupBy/boundary   
{
  …
  "groupByResult" : [ {
    "result" : [ {
      "timestamp" : "2019-12-10T00:00:00Z",
      "value" : 300.0
    } ],
    "groupByObject" : "way/587699645"
  } ]
}

just a count request this time the result is 308 again.

curl --data-urlencode "bpolys@bug_1.txt" -d "time=2019-12-10" -d "types=NODE,WAY" https://api.ohsome.org/v1/elements/count                 
{
  …
  "result" : [ {
    "timestamp" : "2019-12-10T00:00:00Z",
    "value" : 308.0
  } ]
}

bug_1.txt bug_both.txt

//cc @redfrexx

tyrasd commented 4 years ago

I think that the bug doesn't have anything directly to do with the overlapping boundary input data in bug_both.txt, because the difference is already visible if only taking bug_1.txt into account and comparing the output from /count with /count/groupBy/boundary (your last two examples).

tyrasd commented 4 years ago

As far as I can tell the difference of 8 is caused by some short footpath segments (like this one) which only touch the query's boundary polygon in a single point. This can be seen when comparing the output of the corresponding /length vs. /length/groupBy/boundary requests: it is exactly the same.

I'm not sure what exactly causes these ways to be included in the /groupBy/boundary result with two polygons, and one with one, but it might be as simple as some rounding error happening somewhere in the processing pipeline.

Maybe you can try putting a small buffer around your input polygons to circumvent this issue. I would assume that a tiny positive buffer would result in these footways to be included in the result consistently, while a tiny negative buffer would result in these to be excluded consistently.

rtroilo commented 4 years ago

I guess you are right, but the original request by @redfrexx included hundreds of features/boundaries and she experienced a discrepancies with 1% of those features. Maybe it was just a coincidence that the one we check overlaps. bug_all.txt

FabiKo117 commented 4 years ago

I just tried around a bit with different parameters and input geometries. It seems that it affects only features of type WAY and it can't be reproduced when you use a big bounding box around the affected area as an input geometry, or also not when you define smaller bounding boxes within the polygon. So it seems like it's directly related to the input geometry 🤔

Will investigate further 🔍

tyrasd commented 4 years ago

experienced a discrepancies with 1% of those features

This doesn't surprise me. I guess it is caused by the "style of mapping" in the analysed region: If for example the local mappers in a city often connect footpaths with park polygons, and you use these park polygons in your analysis, it will of course happen relatively more often that these boundary cases do occur compared to a city where mappers avoid connecting the highway network with landuse polygons.

tyrasd commented 4 years ago

Let me just think out loud: :thought_balloon: Is it really so bad that there are these small discrepancies here? To me it is more a sign for that the query at hand doesn't use the right metric to quantify the objects one is investigating: For linear features like footways, it doesn't make much sense to count them. A better metric would be to use the /length endpoint of the ohsome API. :thought_balloon:

sfendrich commented 4 years ago

It depends on your research question.

tyrasd commented 4 years ago

what depends on one's research question?

sfendrich commented 4 years ago

Whether you want to count the number or measure the length.

tyrasd commented 4 years ago

@sfendrich Ah sure, of course. I didn't want to imply that there's no valid case for counting OSM objects, sorry if this was the case. I have sometimes the impression that not all users are aware that there is more than the count endpoint in the ohsome-api, and for a variety of use cases length and area can make more sense. But I believe this is getting a bit off-topic here.

Regarding the issue at hand, I looked a bit deeper into the code and my impression is that it is not directly something in the ohsome-api code, but is likely somewhere to be searched and fixed in the OSHDB. It might be complicated though, if not impossible, to fix because the OSHDB uses JTS and JTS uses floating point arithmetic for geometric operations, such as the calculating of intersections of geometries. This means that rounding errors can (and will) happen. My educated guess is that since the exact (internal) order and list of geometry operations is not the same in the two ohsome-api endpoints, the random instabilities of floating point arithmetic sometimes cause objects to be included or excluded from the result. Switching the oshdb to numerically robust (non-floating point) arithmetic is IMO not feasible, meaning that we have to live with such undefined behaviour for features which lie exactly or (very) near to the perimeter of the query boundary. :shrug: What do you think?