elastic / ems-file-service

Data sources for Elastic Map Service
Other
18 stars 12 forks source link

Error Loading US Zip Codes GeoJSON file to index #300

Closed skscharr closed 6 months ago

skscharr commented 6 months ago
  1. Download GeoJSON file from https://maps.elastic.co/#file/usa_zip_codes
  2. Go to Maps -> Create Map -> Add Layer -> Upload File -> Upload file downloaded from 1
  3. Fails to index zip code 85713
    {
    "success": true,
    "failures": [
    {
      "item": 29274,
      "reason": "failed to parse field [geometry] of type [geo_shape]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Polygon self-intersection at lat=32.217540781981995 lon=-111.0919908629878"
      },
      "doc": {
        "geometry": {
          "type": "Polygon",
          "coordinates": [
            [
              [
                -110.978197,
                32.207113
              ],
              [
                -111.132337,
                32.221238
              ],
              [
                -111.132337,
                32.220454
              ],
              [
                -111.071089,
                32.171013
              ],
              [
                -110.991467,
                32.172583
              ],
              [
                -110.967989,
                32.178076
              ],
              [
                -110.966968,
                32.182785
              ],
              [
                -110.961864,
                32.181215
              ],
              [
                -110.960843,
                32.178076
              ],
              [
                -110.908783,
                32.185139
              ],
              [
                -110.909804,
                32.196126
              ],
              [
                -110.910824,
                32.207113
              ],
              [
                -110.935323,
                32.207113
              ],
              [
                -110.947573,
                32.207113
              ],
              [
                -110.960843,
                32.207113
              ],
              [
                -110.978197,
                32.207113
              ]
            ],
            [
              [
                -110.947573,
                32.207113
              ],
              [
                -110.94349,
                32.205543
              ],
              [
                -110.944511,
                32.202404
              ],
              [
                -110.947573,
                32.207113
              ]
            ]
          ]
        },
        "zip": "85713"
      }
    }
    ],
    "docCount": 33083
    }
jsanz commented 6 months ago

thanks @skscharr for reporting. There's indeed a geometry that Elasticearch is not accepting, even locally the tools we usually work with (mapshaper, QGIS) do not complain about it.

Pretty sure the problem is on this vertex where the geometry contacts with a previous vertex:

image


The main use for EMS File Service is to serve as data layer in Kibana Maps. For that this is not an issue since the geospatial data from this repo is processed on the browser and never ingested in Elasticsearch.

To check this is not a problem you can do the following steps:

  1. Create a sample index and data view with a single document pointing to the problematic zipcode
# Create the index
PUT ems_error_300
{
  "mappings": {
    "properties": {
      "id": { "type": "keyword"},
      "value": { "type": "integer"}
    }
  }
}

# Insert a document
POST ems_error_300/_doc
{
  "id": "85713",
  "value": 1
}

# Create the data view
POST kbn:/api/data_views/data_view
{
  "data_view": {
    "title": "ems_error_300"
  }
}
  1. In Kibana Maps create a new Choropleth Map using the Zip Codes dataset and the new ems_error_300 data view.
  2. Check the geometry is linked

image


Still, it is not OK to have a geometry that Elasticsearch does not correctly digest, so I'll see if we can patch that single geometry to behave and look for updating this dataset since there's a newer version from 2020.

cc. @nickpeihl

jsanz commented 6 months ago

@skscharr you may want to give this alternate version a try, from the fix in progress at #302

https://raw.githubusercontent.com/jsanz/ems-file-service/fix/300/usa_zip_codes/data/usa_zip_codes_v7_1.geo.json

jsanz commented 6 months ago

New release of the data has been published to production and https://maps.elastic.co/#file/usa_zip_codes even rendering the same dataset (in TopoJSON format), it points in the GeoJSON button to the new dataset with the fixed geometry available here and at EMS File Service bucket.

skscharr commented 6 months ago

Thank you @jsanz !