Closed tommcintyre closed 9 years ago
Some chat from IRC with opinions about whether this belongs in ES or not... :)
<Tim> GeoJSON spec is a little vague. A polygon must have a LinearRing, defined as:
<Tim> A LinearRing is closed LineString with 4 or more positions. The first and last positions are equivalent (they represent equivalent points). Though a LinearRing is not explicitly represented as a GeoJSON geometry type, it is referred to in the Polygon geometry type definition.
<Tim> not really clear that they must be distinct or non-self-intersecting
<Tom> yeah - i think it's just not well defined
<Tom> the new geo-refactored ES code does seem to attempt to make the best possible use of "borderline" valid geometries
<Tom> for example, twitter emits open linestrings quite often, as well these polygons with repeating points
<Tom> and the new geo-refactored ES code does now accept open linestrings
<Tom> (the old code didn't)
Hi @mvjars,
I think the GeoJson is technically valid. The problem is forced by the underlying libraries that we use to handle geo data inside ES. These libraries make some assumptions to the data which are checked before the data gets indexed. In my opinion this assumptions must not necessarily be checked if the GeoJson is indexed. But currently we use this libraries to index geo data also. I'm working on separating the GeoJson format specification from the logic layer and move this kind of exceptions to the logic layer.
Is there any update on this issue?
@nknize any ideas here?
@lababidi Out of curiosity why make this a polygon type? Why not change it to a multipoint?
@nknize Thanks for checking in. I'm not sure I understand what your question is. Let me see if I can reframe the issue, succinctly.
Twitter provides the following object. It is rejected by elasticsearch for only having 4 elements and not 5. This strict requirement is not consistent with GeoJSON. Please allow elasticsearch to accept this Object, out of the box, because technically it is legal GeoJSON. Currently I must write a middle-man to add a 5th-element (cue Bruce Willis jokes here) to this array of coordinates. This is a bit kludgy in terms of workflow:
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
-9.0915413,
38.6713816
],
[
-9.0915413,
38.9313732
],
[
-8.8102479,
38.9313732
],
[
-8.8102479,
38.6713816
]
]
]
},
@lababidi Thanks for the update. I mistook the problem you're describing for the original issue (repeated points in a polygon). I agree we should accept Polygons like this and close them for you.
This is excellent news. I'd be happy to contribute any code if your plate is full. Just point me to where the preferred location that this logic belongs in and I will pull request.
@nknize If you prefer to close this issue, you may. I realized my statement was a separate issue from this issue and create it at #11131
Probably FAO @chilling .
I have a mapping that includes some geo_shape fields. My test data contains GeoJSON fields that specify points, but do so in the form of polygons with the same lon/lat repeated 4 (or sometimes 3) times. This causes a validation exception to be thrown up from within Spatial4J when it generates the polygon.
I am not sure if this is technically invalid GeoJSON or not - however, this is a form that the Twitter API generates frequently (around 1 in 1500 tweets in my dataset), and other libraries I have used can parse it OK. It would be good if ES can attempt to do what it can with data like this, rather than failing (i.e. treat it as another appropriate type like a point, or relax the verification).
This is tested against the branch that includes the Geo-Refactoring improvements at chilling/elasticsearch@0369983afdcc7c6e8db7703b20889f69ac14221a (it does also happen on master, if you can get that far!).
The stack trace:
To reproduce:
I wrote a nasty, hacky workaround to fix this particular case - see below. This is obviously not an acceptable general purpose solution, as it doesn't address the issue for any other shapes, or other cases like only having 2 or 3 distinct points. I am not really familiar enough with the libraries, but I guess a real fix might involve simplifying or normalizing each geometry somehow before it gets passed to Spatial4J?
Alternatively it could be deemed that this is not ES's job, and the GeoJSON needs to be more well-formed - but I think this is likely to be a common problem due to the source of this data, and because of the different expectations of different GeoJSON libraries.
ShapeBuilder.java: