Closed kfitzgerald closed 7 years ago
For a bit more background here, I ran into this issue when attempting to output records to elasticsearch (embulk-output-elasticsearch relies on this module)
When a row contains null
numeric values, two things could occur:
0
instead of null
8390047115487506287
because it would pull 8 bytes out of the buffer and that's just what happened to be near by. This occurrence is just plain scary. 🎃 Naturally, elasticsearch will reject records with absurdly large numbers with exceptions such as:
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [column_name]
. So in my case, I lost about 25% of the records I was working with, and the other 75% contained bad or missing values.
I would expect null-valued columns to be stored as null
values in elasticsearch, not defaulted to 0
or omitted entirely (which is the equivalent of being undefined
)
@kfitzgerald Thank you for your contribution! Sorry to be late to respond.
Agreed for your first one. Other types were fixed in #87, but Boolean, Long, and Double still remained.
But for the second one, it's not trivial. It depends on the plugin's context. It may be nice for Elasticsearch, but can be bad for other outputs. I'm afraid it can be incompatible from previous versions...
What do you think making it an option of JacksonAllInObjectScope
? It means that adding new constructor(s) like JacksonAllInObjectScope(boolean fillsJsonNullForEmbulkNull)
, and setting this fillsJsonNullForEmbulkNull
as false
in default (existing) constructors.
@dmikurube Yeah I thought about making it configurable, but hadn't explored the best way of doing it. I'm always a fan of backwards compatible updates.
I'll take a stab at this and keep you posted.
@dmikurube Ok – This should be good to go if you'd like to review.
Also, I have changes staged in the embulk-output-elasticsearch plugin that will get a PR once this module has been bumped and deployed.
Thanks again, -Kevin
@kfitzgerald The change looks good to me. I did minor change your PR on https://github.com/kfitzgerald/embulk-base-restclient/pull/1. Please take a look?
LGTM – merged!
This PR changes two things and adds some supporting tests.
Firstly:
0
) or worseI believe this to be the reason that exporting data through classes based off this module,
null
numeric values are being encoded either as0
or seemingly random numbers (leaked memory from other parts of the encoded buffer).Secondly:
null
notundefined
(e.g. omitted)In my opinion, when a column value is
null
, then it should be encoded in JSON asnull
. When you think about it, each column has a given name and a known data type. This means, if the column lacks a value, then the key is defined, but the value is null. As it currently stands, string, timestamp and JSON types are omitted if null, meaning their values are being encoded asundefined
. Simply put,null
is not the same asundefined
.While this latter change I believe is technically correct, it may affect other modules based upon this one, for better or for worse.
Please let me know if there is anything else I can help with regarding this issue.
Thanks, -Kevin