Open mikereinhold opened 1 month ago
Hi! It is true that Ion is a superset of JSON, but it doesn't follow that JSON Arrays should necessarily be treated as Rows/Structs by the Ion SerDe. I understand why it seems implied, but it's not a given.
We don't have any plans for active development on the Hive SerDe but other ecosystem integrations (namely Trino) are in-flight. In what engine/deployment are you using the Hive SerDe? Trino? AWS Athena? Spark? Something else?
According to JSON standards (RFC 4627, ECMA-404, and RFC 8259), an array is a legal top-level JSON text.
According to the Amazon Ion Hive SerDe documentation:
Based on this, it is expected that JSON files with top level (anonymous) arrays should be properly understood and decoded by the Amazon Ion Hive SerDe.
For example:
[{"a": "b", "b": 123, "c": true}, {"a": "z", "b": 456, "c": false}]
However the Ion Hive SerDe does not properly interpret these files:
Table definition:
However this results in no query results and no input bytes to the execution engine by the SerDe:
In my testing, the OpenX JSON SerDe correctly handles similar data files.