Open rtroilo opened 4 years ago
I think that the bug doesn't have anything directly to do with the overlapping boundary input data in bug_both.txt
, because the difference is already visible if only taking bug_1.txt
into account and comparing the output from /count
with /count/groupBy/boundary
(your last two examples).
As far as I can tell the difference of 8 is caused by some short footpath segments (like this one) which only touch the query's boundary polygon in a single point. This can be seen when comparing the output of the corresponding /length
vs. /length/groupBy/boundary
requests: it is exactly the same.
I'm not sure what exactly causes these ways to be included in the /groupBy/boundary
result with two polygons, and one with one, but it might be as simple as some rounding error happening somewhere in the processing pipeline.
Maybe you can try putting a small buffer around your input polygons to circumvent this issue. I would assume that a tiny positive buffer would result in these footways to be included in the result consistently, while a tiny negative buffer would result in these to be excluded consistently.
I guess you are right, but the original request by @redfrexx included hundreds of features/boundaries and she experienced a discrepancies with 1% of those features. Maybe it was just a coincidence that the one we check overlaps. bug_all.txt
I just tried around a bit with different parameters and input geometries. It seems that it affects only features of type WAY and it can't be reproduced when you use a big bounding box around the affected area as an input geometry, or also not when you define smaller bounding boxes within the polygon. So it seems like it's directly related to the input geometry 🤔
Will investigate further 🔍
experienced a discrepancies with 1% of those features
This doesn't surprise me. I guess it is caused by the "style of mapping" in the analysed region: If for example the local mappers in a city often connect footpaths with park polygons, and you use these park polygons in your analysis, it will of course happen relatively more often that these boundary cases do occur compared to a city where mappers avoid connecting the highway network with landuse polygons.
Let me just think out loud: :thought_balloon: Is it really so bad that there are these small discrepancies here? To me it is more a sign for that the query at hand doesn't use the right metric to quantify the objects one is investigating: For linear features like footways, it doesn't make much sense to count
them. A better metric would be to use the /length
endpoint of the ohsome API. :thought_balloon:
It depends on your research question.
what depends on one's research question?
Whether you want to count the number or measure the length.
@sfendrich Ah sure, of course. I didn't want to imply that there's no valid case for counting OSM objects, sorry if this was the case. I have sometimes the impression that not all users are aware that there is more than the count
endpoint in the ohsome-api, and for a variety of use cases length
and area
can make more sense. But I believe this is getting a bit off-topic here.
Regarding the issue at hand, I looked a bit deeper into the code and my impression is that it is not directly something in the ohsome-api code, but is likely somewhere to be searched and fixed in the OSHDB. It might be complicated though, if not impossible, to fix because the OSHDB uses JTS and JTS uses floating point arithmetic for geometric operations, such as the calculating of intersections of geometries. This means that rounding errors can (and will) happen. My educated guess is that since the exact (internal) order and list of geometry operations is not the same in the two ohsome-api endpoints, the random instabilities of floating point arithmetic sometimes cause objects to be included or excluded from the result. Switching the oshdb to numerically robust (non-floating point) arithmetic is IMO not feasible, meaning that we have to live with such undefined behaviour for features which lie exactly or (very) near to the perimeter of the query boundary. :shrug: What do you think?
count/groupBy/boundary request with two overlapping boundaries with a value of 308 for the first ( "groupByObject" : "way/587699645") boundary.
count/groupBy/boundary request with just one boundary, we get a value of 300 now.
just a count request this time the result is 308 again.
bug_1.txt bug_both.txt
//cc @redfrexx