Documents with anonymous top level array are not properly decoded

According to JSON standards (RFC 4627, ECMA-404, and RFC 8259), an array is a legal top-level JSON text.

According to the Amazon Ion Hive SerDe documentation:

Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets.

Based on this, it is expected that JSON files with top level (anonymous) arrays should be properly understood and decoded by the Amazon Ion Hive SerDe.

For example: [{"a": "b", "b": 123, "c": true}, {"a": "z", "b": 456, "c": false}]

However the Ion Hive SerDe does not properly interpret these files:

Table definition:

CREATE EXTERNAL TABLE `top_level_array_test`(
  `array` array<struct<a:string,b:int,c:boolean>>
)
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
WITH SERDEPROPERTIES ( 
  'ion.encoding'='TEXT', 
  'ion.fail_on_overflow'='false',
  'ion.ignore_malformed'='false'
) 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  '...'

However this results in no query results and no input bytes to the execution engine by the SerDe:

In my testing, the OpenX JSON SerDe correctly handles similar data files.

amazon-ion / ion-hive-serde

Documents with anonymous top level array are not properly decoded #111