embulk / embulk-output-elasticsearch

Apache License 2.0
32 stars 26 forks source link

Fix handling of null values + upstream bug fixes in embulk-base-restclient #47

Closed kfitzgerald closed 7 years ago

kfitzgerald commented 7 years ago

Greetings!

This PR does three things:

  1. It updates the embulk-base-restclient to 0.5.5, which fixes handling of null values for numeric and boolean types. Additionally, it allows proper output of null values (instead of treating nulls as undefined).

  2. It uses proper null handling for output, so null values will be outputed to elasticsearch as null, not undefined (e.g. being omitted entirely). Elasticsearch expects and handles null values, so this should be the default, IMHO.

  3. It adds a set of unit tests, which ensure that proper null value handling is being achieved.

Please review and let me know if there's anything else I can do.

Thanks again, -Kevin

kfitzgerald commented 7 years ago

See also: https://github.com/embulk/embulk-base-restclient/pull/102

sakama commented 7 years ago

@kfitzgerald Thanks for your PR. I've considered about backward compatibility and added new accept_null_value option and set false as a default. https://github.com/kfitzgerald/embulk-output-elasticsearch/pull/1

Could you take a look?

@muga I'm still wondering if we should set true as a default or not. I think it's possible to change behavior if we announce major update of this plugin. What do you think about?

kfitzgerald commented 7 years ago

@sakama Merged! I agree, having an option to preserve the old functionality would be best.

@sakama @muga I think in the case of elasticsearch, I would be in favor of defaulting to true. Understandably, there may be some users who are affected, but chances are the impact would be low, since when queried, they would be excluded anyway. There's a chance that aggregations could be affected, but I would suspect that overall impact would be low.

I am not strongly advocating this way, since new usages of the output plugin could turn this on, or if exposed through a web interface (e.g. TD's connectors ui) the ui-default could set this to true for new connections, preventing issues with older existing scheduled jobs.

sakama commented 7 years ago

Merged. I'll release new version soon.