Closed jacob-stein closed 2 years ago
Closing this PR, since we have to add this to the hive2 and hive3 branches.
Just a quick comment. It seems like we can still use this PR if we would like to keep the previous discussion on track --- we can edit the destination branch by clicking "Edit" button next to the title. (I tried to but I'm not allowed).
This PR allows for all hive primitive types to be used as keys in Hive maps. Since Ion requires that all keys be symbols, we first convert the hive primitive to a string, that can then be included as a symbol when serialized to Ion. During deserialization, the object inspectors will read the field name and parse the Hive/java primitive object from the string.
Use Case and Example:
The serde serializes Hive maps as Ion structs, using the key values as struct keys.
Gets serialized in Ion as
{'a':"value1", 'b':"value2", 'c':"value3"}
Hive maps can use any primitive type as a key value for its map type. Since Ion struct keys are symbols, this issue manifests itself when the Hive primitive used as a key is not a type that is natively converted to IonText, like if the schema has a MAP<INT, STR>. If such a schema is passed in currently, the serde will throw an error during the write. While this is not an issue if the data is externally generated in Ion natively and just being read via the serde, it does limit the generation of new data, whether it is being inserted as a value or being copied from an existing table (stored in another format).
With this PR, instead of an error being thrown, the data will first be converted to a string before being converted to a symbol. For example:
will now get serialized as
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.