getodk / central-backend

Node.js based backend for ODK Central
https://docs.getodk.org/central-intro/
Apache License 2.0
50 stars 75 forks source link

Wrong Polygon representation in OData submissions API #322

Closed larshelge closed 3 years ago

larshelge commented 3 years ago

Problem description

This issue is about the OData API and the submissions data document and the way it renders polygon data e.g. for the geoshape widget. It applies to ODK Central.

Data captured e.g. for the geoshape widget is mapped to the GeographyPolygon type in OData.

This is rendered similar to the example below. Note that the coordinates are rendered as an array of points.

"geoshape_widget": {
    "type": "Polygon",
    "coordinates": [
        [
            13.052775,
            58.252702,
            0
        ],
        [
            13.057901,
            58.255344,
            0
        ],
        [
            13.052073,
            58.253716,
            0
        ],
        [
            13.052775,
            58.252702,
            0
        ]
    ]
}

Now according to the GeoJSON standard (RFC 7946), a polygon should be rendered as an array of linear rings:

Coordinates of a Polygon are an array of linear ring (see Section 3.1.6) coordinate arrays. The first element in the array represents the exterior ring. Any subsequent elements represent interior rings (or holes).

RFC 7946: https://tools.ietf.org/html/rfc7946#appendix-A.3

Consequently, ODK should render polygons as an array of arrays of points, similar to this:

"geoshape_widget": {
    "type": "Polygon",
    "coordinates": [
        [
            [
                13.052775,
                58.252702,
                0
            ],
            [
                13.057901,
                58.255344,
                0
            ],
            [
                13.052073,
                58.253716,
                0
            ],
            [
                13.052775,
                58.252702,
                0
            ]
        ]
    ]
}

The current challenge with the deviation from the GeoJSON standard is that OData libraries break when encountering polygon data. In my case, I am using Apache Olingo to read the OData API for ODK, and it crashes with a NullpointerException when it attempts to parse points within polygons. Since I cannot control this code, it becomes a blocker in this case.

Steps to reproduce the problem

larshelge commented 3 years ago

Just checking if there is hope for getting this one fixed? I am a bit concerned as it will crash OData client without offering a workaround.

issa-tseng commented 3 years ago

hi! apologies. seems the nesting issue is a pretty quick fix so you'll see it in the next release for sure.

but also, odk collect does not produce correct polygon data for geospatial purposes, really. it doesn't respect the right winding direction and doesn't do some of the typical things to avoid edge cases, like dealing with self-overlaps. it really is just a pile of points in collection order. all that to say, it will probably still crash a lot of clients because it's technically degenerate data. i guess we could add post-processing in central but certainly clean data at the source would be better.

(/cc @lognaturel not sure if i've complained about this before)

lognaturel commented 3 years ago

not sure if i've complained about this before

You have! And it's on the radar otherwise as well. In practice, we've never had complaints about downstream tools having problems because of degenerate polygons. Data collectors always have a map in front of them when collecting traces and shapes and generally have specific instructions on how to do the collection. Agreed it would be nice to have a more standard way of representing composite geo data but that would require folks who can't do what they need with what is currently exposed speaking up.

it will probably still crash a lot of clients

This would only be if specific data points were degenerate in some way? My understanding is that many clients are still robust to the right hand rule not being followed which is the most likely violation, I believe.

florianm commented 3 years ago

A related nitpick with ODK Collect is the occasional whitespace in Geotraces/shapes (https://github.com/getodk/central-backend/issues/300). Easily fixed by a trim() without too much performance overhead, see my ticket. If ODK Central ever were to add post-processing to geo data, I'd suggest #300 would be a good candidate to include.

larshelge commented 3 years ago

Great, thanks for the responses! Yes I am mainly concerned about the potential crashes. I think the extra array will prevent that.

I'm not familiar with the geospatial capabilities of ODK and the need for additional improvements for correctness. I guess that might not be a priority as you indicate or go into separate, new issues.

lognaturel commented 3 years ago
lognaturel commented 3 years ago

You were right, @issa-tseng, PowerBI is happy with this as well. 🎉