Open billjohnston opened 3 years ago
Did you ever figure this out? I spent a couple hours today and finally came up with this schema:
new parquetJs.ParquetSchema({
names: {
type: 'LIST',
fields: {
list: {
repeated: true,
fields: {
element: {
type: 'UTF8'
}
}
}
}
}
})
I also had to prepare the data going into the names field by wrapping values with list
and element
, like so:
{
names: {
list: [
{ element: 'cathy' },
{ element: 'Zooplar the Magnificent' },
{ element: 'tim' }
]
}
Using this pattern, AWS Glue sees the names field as array<string>
, and they show up in Athena queries.
It feels like this was a LOT harder than it should be, so if anyone knows of an easier way, please do let me know. 😅
Having trouble getting a top level array field to work with Athena
Athena create table query:
parquetjs schema:
I'm able to upload and create the parquet file with this, and queries that don't include the
tags
field are working fine:But if I run any query that includes the
tags
field I get this error:I noticed there is a
LIST
field type that is supposed to work with Athena, but I'm not sure how I'd specify a top level list of strings