elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.72k stars 8.14k forks source link

CSV upload tool does not recognize geo_point field #66629

Closed rayafratkina closed 4 years ago

rayafratkina commented 4 years ago

Describe the feature: I was playing with this dataset that has a lat_long field containing geo point data. When I load it via csv upload tool, the field type is automatically detected as string. If I change it to geo_point everything works correctly, but it would be nice to automatically detect right away.

Describe a specific use case for the feature: Autodetect data of the format POINT (-73.9570437717691 40.794850940803904) as geo_point and possibly provide a way to set custom format (similar to the one for date).

elasticmachine commented 4 years ago

Pinging @elastic/kibana-gis (Team:Geo)

jsanz commented 4 years ago

This could be more generic, CSVs with geometries encoded as WKT are not that common, compared with having coordinates in separated columns (as the CSV also has).

In many tools that support geospatial fields and import CSVs, we could have a way to explicit how geometries are encoded.

To show an example, this is QGIS CSV import tool. It automatically found the WKT field and provided all the settings.

image

Here the form to assign x/long, y/lat columns UI

image

The import tool supports way more settings and possibilities but I wanted to show these settings in particular.

For what is worth, both CARTO BUILDER and ArcGIS Online cloud tools automatically detect those fields and the import is done without any intervention of the user.

benwtrent commented 4 years ago

This has been resolved in https://github.com/elastic/elasticsearch/issues/56967

Fileuploader service will now detect all WKT geometries. If every field is a POINT geometry, it will be mapped as a geo_point. All other WKT will be mapped as a geo_shape

jsanz commented 4 years ago

@benwtrent this is great!!

While WKT is not a rare format (despite the name), having data in separate columns for latitude and longitude is a way more frequent use case. No idea if it would be possible to support, maybe adding some convention. For example this web importer assumes a few column names to try to find them

https://github.com/CartoDB/cartodb/blob/master/services/importer/lib/importer/ogr2ogr.rb#L18-L21

cc @nickpeihl @choobinejad

choobinejad commented 4 years ago

Agree - I see numeric latitude/longitude columns quite frequently in CSV files. A best-effort attempt to identify them in files and map them to a geo_point would remove a roadblock for users, and that's always a good thing!

benwtrent commented 4 years ago

Difficulty their is now each column is treated separately in determining the appropriate mapping. It might be enough to supply an easy override in the UI (here are my lat/long columns, make them a point) and then we can generate the pipeline + mapping from there.

choobinejad commented 4 years ago

@benwtrent I think your approach achieves the goal (improve geo_point file upload experience) while working with the tools we have now. ++.

I also think that the easy manual override you suggest could have expanded utility in the future (e.g. if there are 2 columns that have consistently valid decimal degree or degrees-minutes-seconds data, then guess that they together represent a geo_point... but give users the opportunity to correct that assumption if it's wrong using the manual override).