aspectumapp / osm2geojson

Convert OSM and Overpass XML/JSON to GeoJSON
MIT License
100 stars 14 forks source link

An Enclave inside an Enclave won't make it through conversion #35

Open eseglem opened 2 years ago

eseglem commented 2 years ago

Its very much an edge case, but if you have a Multi Polygon which had a part inside a hole of itself that part does not end up in the output geojson. It happens at least one place in the world: Baarle-Nassau which shows up all the way up through the Netherlands country level.

Pull the XML from Overpass with rel(id:2718379);out geom; and the output geojson will not contain the 7 internal enclaves. Nederlandse enclave N1-7 (relations 13193251 - 13193257).

Took me quite a while to track through and understand exactly what was happening. Unfortunately, since all the outers and inner's are grouped together, when _convert_lines_to_multipolygon is called on outer the unary_union eats all of them. Since they are inside the outermost boundary. So its technically doing what its supposed to be doing.

Looking through the XML, they seem to be in order. 16 outer, 25 inner, 8 more outer. But I have no idea if that is guaranteed. Not the cleanest code, but something like this appears to work better this specific polygon. But only works in general if the order is guaranteed, and probably needs some more safety checks as noted.

def multipolygon_relation_to_shape(
        rel, refs_index,
        area_keys: Optional[dict] = None, polygon_features: Optional[list] = None
):
    # List of Tuple (role, multipolygon)
    shapes = []

    if 'members' in rel:
        members = rel['members']
    else:
        found_ref = get_ref(rel, refs_index)
        if not found_ref:
            error('Ref for multipolygon relation not found in index', pformat(rel))
            return None
        members = found_ref['members']

    for member in members:
        if member['type'] != 'way':
            warning('Multipolygon member not handled', pformat(member))
            continue

        member['used'] = rel['id']

        way_shape = way_to_shape(member, refs_index, area_keys, polygon_features)
        if way_shape is None:
            # throw exception
            warning('Failed to make way', pformat(member), 'in relation', pformat(rel))
            continue

        if isinstance(way_shape['shape'], Polygon):
            way_shape['shape'] = LineString(way_shape['shape'].exterior.coords)

        shapes.append((member['role'],way_shape['shape']))  

    # Intermediate groups
    groups = []
    # New group each time it switches role
    for role, group in itertools.groupby(shapes, lambda s: s[0]):
        groups.append((role, _convert_lines_to_multipolygon([_[1] for _ in group])))

    # Grab the first one. Should be "outer", may need safety here?
    multipolygon = groups[0][1]
    # Itterate over the rest if there are any
    for role, mp in groups[1:]:
        if role == "inner":
            multipolygon = multipolygon.difference(mp)
        else:  # I think it should only be "outer", but more safety?
            multipolygon = multipolygon.union(mp)

    if multipolygon is None:
        warning('Relation not converted to feature', pformat(rel))
        return None

    multipolygon = fix_invalid_polygon(multipolygon)
    multipolygon = to_multipolygon(multipolygon)
    multipolygon = orient_multipolygon(multipolygon)  # do we need this?

    return {
        'shape': multipolygon,
        'properties': get_element_props(rel)
    }
rapkin commented 2 years ago

@eseglem Thank you for your work (and code example provided)! I made a few changes in your code and added test for this bug.

eseglem commented 2 years ago

@rapkin Would you mind creating a new release with the fix?

Although I just noticed the test may have failed on github actions. Let me know if I can help with anything.

rapkin commented 2 years ago

@eseglem Here release with that change https://pypi.org/project/osm2geojson/0.2.1/. Thank you for noticing that problem with github actions. I think it's because of different shapely version (I have older on my computer). Geometry produced with both versions looks the same (but order of polygons in multipolygon is different)

eseglem commented 2 years ago

Thanks for being so responsive.

Unfortunately, what I was worried about seems to be a problem. Order does not appear to be guaranteed. I just pulled one where the first member was inner. So it returns empty features. Worked out some potentially better logic. Running through some more comprehensive testing now.

rapkin commented 2 years ago

Ok, I'll take a look on that tomorrow. But I need this 'empty' example (you can send me whole XML/JSON content from Overpass)

eseglem commented 2 years ago

By empty I mean it ends up {"type: "FeatureCollection", "features": []} because a None gets returned somewhere in there.

Serbia: rel(id:1741311);out geom; is the first one I ran into with inner first. I added logic that worked for that one, but sadly the logic fails for Kyrgyzstan rel(id:178009);out geom; The first outer group is self intersecting on its own. I will keep digging but running out of good ideas for how to handle the weird cases.

rapkin commented 2 years ago

That's what I need, thank you!

eseglem commented 2 years ago

I found this: https://github.com/tyrasd/osmtogeojson/blob/gh-pages/index.js#L746 It powers overpass-turbo, and it appears to work for all of the weird ones. But didn't get to work through the code and see how it works.

rapkin commented 1 year ago

@eseglem I hope this fix https://github.com/aspectumapp/osm2geojson/commit/024c06b099bc41a0c734d8435c048116c2cc3055 should help with your problem (it still may not work with other examples, becuase this solution doesn't look reliable). I read that code in osmtogeojson package and made some progress on that in branch work-on-enclave-problem, but my result was not successful (so I made a pause with that).

PS. Sorry for very late response, it was hard month in Ukraine.

pijll commented 1 year ago

I still have the same problem. I found this when converting the same OSM objects as in the original report, Baarle-Nassau (Netherlands).

I've been able to boil the problem down to a simple example. Given an OSM-json file with nested enclaves:

{"elements": [
  {"type":  "relation", "id":  1, "members":  [
    {"type": "way", "ref": 11, "role": "outer", "_comment": "boundary (outermost)"},
    {"type": "way", "ref": 12, "role": "outer", "_comment": "enclave within enclave"},
    {"type": "way", "ref": 13, "role": "inner", "_comment": "enclave"}
  ],"tags": {"boundary": "administrative"}},
  {"type": "node", "id":  111, "lat": 0, "lon": 0},
  {"type": "node", "id":  112, "lat": 5, "lon": 0},
  {"type": "node", "id":  113, "lat": 5, "lon": 5},
  {"type": "node", "id":  114, "lat": 0, "lon": 5},
  {"type": "node", "id":  121, "lat": 2, "lon": 3},
  {"type": "node", "id":  122, "lat": 3, "lon": 3},
  {"type": "node", "id":  123, "lat": 3, "lon": 2},
  {"type": "node", "id":  124, "lat": 2, "lon": 2},
  {"type": "node", "id":  131, "lat": 1, "lon": 4},
  {"type": "node", "id":  132, "lat": 4, "lon": 4},
  {"type": "node", "id":  133, "lat": 4, "lon": 1},
  {"type": "node", "id":  134, "lat": 1, "lon": 1},
  {"type": "way", "id": 11, "nodes": [111, 112, 113, 114, 111]},
  {"type": "way", "id": 12, "nodes": [121, 122, 123, 124, 121]},
  {"type": "way", "id": 13, "nodes": [131, 132, 133, 134, 131]}
]}

Note that the last two members of the relation are in the wrong order, logically.

Converting this with osm2geojson.json2shapes results in just a square with a single enclave. When the second and third members of the relation are swapped, the result is correct (square, containing an enclave, which contains a counter-enclave). The result should of course be independent of the order of the members.

Tests:

    # This test fails
    def test_nested_polygons_wrong_order(self):
        json_file = pathlib.Path('nested_polygons.json')
        json_object = json.loads(json_file.read_text())

        shapes = osm2geojson.json2shapes(json_object)
        multipoly = shapes[0]['shape']

        self.assertEqual(len(multipoly.geoms), 2)
        self.assertTrue(multipoly.contains(shapely.Point(2.5, 2.5)))

    # This test succeeds
    def test_nested_polygons_correct_order(self):
        json_file = pathlib.Path('nested_polygons.json')
        json_object = json.loads(json_file.read_text())
        members = json_object['elements'][0]['members']
        members[1], members[2] = members[2], members[1]

        shapes = osm2geojson.json2shapes(json_object)
        multipoly = shapes[0]['shape']

        self.assertEqual(len(multipoly.geoms), 2)
        self.assertTrue(multipoly.contains(shapely.Point(2.5, 2.5)))

A more complicated test would be the town of Baarle-Nassau: when converting OSM relation 47798, the little enclave given by OSM way 35145945 is not included in the result; when converting OSM relation 2718379, that same enclave is included. The two relations cover mostly the same area, but the order of the ways differs.