AgileWorksOrg / elasticsearch-river-csv

CSV river for ElasticSearch
Apache License 2.0
91 stars 45 forks source link

Import of null or empty values from CSV #38

Closed flicker581 closed 10 years ago

flicker581 commented 10 years ago

We have a csv log with some IP addresses in it. We would like to store them as ip type in elasticsearch, to make them searchable by range. Unfortunately, sometimes one of IPs is missing in CSV line. The plugin tries to insert empty string ("") into the field, and this fails, because the empty string is not recognized as valid IP.

For this field it would be nice if it is not indexed at all when it is empty. Other empty fields, like numerical ones, may also benefit from the change.

vtajzich commented 10 years ago

CSV river doesn't care about your data. What you have in CSV, it will be loaded to ES. You have to do ETL and cleanup your CSV to make them valid.

flicker581 commented 10 years ago

As far as I can see, current CSV river have no way to distinguish between NULL value and empty string in CSV file. For example, MySQL "LOAD DATA INFILE" has this feature.

Actually, for Elasticsearch there are three distinguishable possible and desirable values for a CVS field:

  1. Empty string
  2. NULL value (field is indexed and stored as NULL)
  3. Missing field in the line (a field has no value and should not be indexed and stored at all)
aritrachatterjee15 commented 10 years ago

When the index is created, you can define in the index mapping the default value if the field's data is null. Then, point the river to that index. I'm hoping that'll solve the problem.

vtajzich commented 10 years ago

@aritratony is right. @flicker581 take a look here http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html and provide mapping.

Let us know if this helps.