elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
969 stars 24.82k forks source link

Orientation of geo_shapes is ignored #35813

Closed iverase closed 1 year ago

iverase commented 5 years ago

It seems the setting orientation for geo_shape has no effect. Here is an example:

PUT geoshape
{
  "settings":{
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}

GET geoshape/_mapping

PUT geoshape/_mapping/_doc
{
  "properties": {
    "geometry": {
      "type": "geo_shape",
      "orientation": "ccw"
    }
  }
}

PUT geoshape/_doc/1/
{
  "geometry":"POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))" 
}

PUT geoshape/_doc/2/
{
  "geometry":"POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))" 
}

GET geoshape/_search

GET geoshape/_search
{
    "query":{
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "geometry": {
                        "shape": {
                            "type": "point",
                            "coordinates" : [0.5, 0.5]
                        },
                        "relation": "intersects"
                    }
                }
            }
        }
    }
}

answer:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "geoshape",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "geometry": "POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))"
        }
      },
      {
        "_index": "geoshape",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "geometry": "POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))"
        }
      }
    ]
  }
}

My expectations are that those polygons are different and mutually exclusive, but they seem to represent the same polygon regardless of the order of the vertices.

elasticmachine commented 5 years ago

Pinging @elastic/es-analytics-geo

iverase commented 5 years ago

I think I overshoot here due to my expectations (orientation defines insideness of the polygon) which is not what it is documented.

After looking at the implementation and reading careful the documentation, it seems consistent the behaviour. If the orientation of the polygon is different to the provided orientation, then the resulting polygon is the one where the length of the edges is lower than half of the hemisphere.

imotov commented 5 years ago

This is the current behavior the way I understand it. There are 5 factors that affect how we treat the polygon:

1) if we applied both clockwise and counterclockwise orientation to the given polygon, would the smaller polygon (one where the length of the edges is lower than half of the hemisphere) cross the dateline? If the answer is no then we ignore all other factors and take the smaller polygon. Otherwise, the behavior depends on

2) Is this polygon represented as WKT or JSON? If the polygon is represented as WKT then we only have 1 thing to check:

3) What is the ordering of coordinates in the shape (clockwise for outer shell by default)? We interpret the the polygon using standard counterclockwise, the orientation value in the mapping is ignored. However if the polygon is presented as Json, we need to check 2 more factors:

4) What is the value of the orientation parameter in GeoJson? If this is set to left or cw we will treat coordinates as clockwise polygon. Otherwise we will check mapping

5) What is the value of the value of the orientation parameter in mapping. If it is set to left or cw we will treat coordinates as clockwise polygon, otherwise as counterclockwise polygon.

Here is a script that can be used to demonstrate this logic:

DELETE test

PUT test
{
  "mappings": {
    "properties": {
      "shape": {
        "type": "geo_shape"
      },
      "anti_shape": {
        "type": "geo_shape",
        "orientation": "cw"
      }
    }
  }
}

PUT test/_doc/1
{
  "name": "WKT, dateline",
  "shape": "POLYGON ((160 0, 160 10, -160 10, -160 0, 160 0))",
  "anti_shape": "POLYGON ((160 0, 160 10, -160 10, -160 0, 160 0))"
}

PUT test/_doc/2
{
  "name": "WKT, dateline, reversed coordinates",
  "shape": "POLYGON ((160 0, -160 0, -160 10, 160 10, 160 0))",
  "anti_shape": "POLYGON ((160 0, -160 0, -160 10, 160 10, 160 0))"
}

PUT test/_doc/3
{
  "name": "Geo json, dateline, default orientation",
  "shape": {
    "type": "polygon",
    "coordinates": [[[160, 0], [160, 10], [-160, 10], [-160, 0], [160, 0]]]
  },
  "anti_shape": {
    "type": "polygon",
    "coordinates": [[[160, 0], [160, 10], [-160, 10], [-160, 0], [160, 0]]]
  }
}

PUT test/_doc/4
{
  "name": "Geo json, dateline revesed orientation",
  "shape": {
    "type": "polygon",
    "coordinates": [[[160, 0], [160, 10], [-160, 10], [-160, 0], [160, 0]]],
    "orientation": "cw"
  },
  "anti_shape": {
    "type": "polygon",
    "coordinates": [[[160, 0], [160, 10], [-160, 10], [-160, 0], [160, 0]]],
    "orientation": "ccw"
  }
}

PUT test/_doc/5
{
  "name": "Geo json, dateline, default orientation, reversed coordinates",
  "shape": {
    "type": "polygon",
    "coordinates": [[[160, 0], [-160, 0], [-160, 10], [160, 10], [160, 0]]]
  },
  "anti_shape": {
    "type": "polygon",
    "coordinates": [[[160, 0], [-160, 0], [-160, 10], [160, 10], [160, 0]]]
  }
}

PUT test/_doc/6
{
  "name": "WKT, not on dateline",
  "shape": "POLYGON ((20 0, 20 10, -20 10, -20 0, 20 0))",
  "anti_shape": "POLYGON ((20 0, 20 10, -20 10, -20 0, 20 0))"
}

PUT test/_doc/7
{
  "name": "Geo json, not on dateline, default orientation",
  "shape": {
    "type": "polygon",
    "coordinates": [[[20, 0], [20, 10], [-20, 10], [-20, 0], [20, 0]]]
  },
  "anti_shape": {
    "type": "polygon",
    "coordinates": [[[20, 0], [20, 10], [-20, 10], [-20, 0], [20, 0]]]
  }
}

PUT test/_doc/8
{
  "name": "Geo json, not on dateline, revesed orientation",
  "shape": {
    "type": "polygon",
    "coordinates": [[[20, 0], [20, 10], [-20, 10], [-20, 0], [20, 0]]],
    "orientation": "cw"
  },
  "anti_shape": {
    "type": "polygon",
    "coordinates": [[[20, 0], [20, 10], [-20, 10], [-20, 0], [20, 0]]],
    "orientation": "ccw"
  }
}

PUT test/_doc/9
{
  "name": "Geo json, not on dateline, reversed coordinates, default orientation",
  "shape": {
    "type": "polygon",
    "coordinates": [[[20, 0], [-20, 0], [-20, 10], [20, 10], [20, 0]]]
  },
  "anti_shape": {
    "type": "polygon",
    "coordinates":  [[[20, 0], [-20, 0], [-20, 10], [20, 10], [20, 0]]]
  }
}

GET test/_search?_source_excludes=*shape
{
  "query": {
    "geo_shape": {
      "shape": {
        "shape":"POINT (0 5)",
        "relation": "intersects"
      }
    }
  }
}

GET test/_search?_source_excludes=*shape
{
  "query": {
    "geo_shape": {
      "anti_shape": {
        "shape":"POINT (0 5)",
        "relation": "intersects"
      }
    }
  }
}

GET test/_search?_source_excludes=*shape
{
  "query": {
    "geo_shape": {
      "shape": {
        "shape":"POINT (179 5)",
        "relation": "intersects"
      }
    }
  }
}

GET test/_search?_source_excludes=*shape
{
  "query": {
    "geo_shape": {
      "anti_shape": {
        "shape":"POINT (179 5)",
        "relation": "intersects"
      }
    }
  }
}

GET test/_search?_source_excludes=*shape
{
  "query": {
    "bool": {
      "should": [
        {
          "geo_shape": {
            "shape": {
              "shape": "POINT (179 5)",
              "relation": "intersects"
            }
          }
        },
        {
          "geo_shape": {
            "anti_shape": {
              "shape": "POINT (179 5)",
              "relation": "intersects"
            }
          }
        }
      ]
    }
  }
}
mcquinne commented 4 years ago

Hey @imotov just wanted to chime in with a use case where the logic you described is causing an issue:

I'm working on a search engine to find NOAA environmental data. Some of our most valuable data products are built using data from the GOES-R series satellites, which are a pair of geostationary satellites centered roughly on the east and west coasts of the US. The extent of these data products reaches from the Japanese coast to central Europe; it's a bounding box which legitimately covers more than 220° of longitude and crosses both meridians. It looks like this in counterclockwise GeoJSON:

{
  "type": "Polygon",
  "orientation": "ccw",
  "coordinates": [[
    [141.7005, -81.3282],
    [6.2995, -81.3282],
    [6.2995, 81.3282],
    [141.7005, 81.3282],
    [141.7005, -81.3282]
  ]],
}

The result of the logic you described in # 1 of your post results in the selection of the smaller, clockwise polygon which does not cross the dateline, i.e. most everywhere on Earth that is not visible to these satellites.

After running some experiments it looks like we can work around the issue by translating one of the edges by 360, e.g. so the bbox runs from 141° to 366°. Still, I find it frustrating that you're choosing to index the opposite geometry of what we intend, even if we indicate its orientation explicitly.

Do you think there's any chance this logic will change in the future? Or if perhaps some kind of strict orientation mode could be enabled via the field mapping?

iverase commented 4 years ago

Hi @mcquinne,

I am wondering why you are describing your shape as a polygon instead of describing it as a bounding box, is there any reason in particular?

Regarding the current topology model, I agree it can be misleading. We are discussing internally how to transition to a more clear model but of course we need to guarantee backwards compatibility. Still your polygon is incorrect in the sense that if considering a proper ellipsoidal model, the edge between two points should be defined as the shortest path, therefore you cannot have edges bigger than 180 degrees.

iverase commented 1 year ago

This is already documented in the docs