benwatson528 / intellij-avro-parquet-plugin

A Tool Window plugin for IntelliJ that displays Avro and Parquet files and their schemas in JSON.
Apache License 2.0
44 stars 8 forks source link

Parquet Column Names with Invalid AVRO Characters #98

Closed jtmorelo closed 2 years ago

jtmorelo commented 2 years ago

This plugin appears to use avro to read parquet files but avro's valid name characters do not allow for "." to appear in a column name. Parquet however, does not have this restriction.

Note that the parquet table.flatten() api will flatten a struct field in a parquet file and add "." to column names. This plugin will flag such columns with an error:

Error: Unable to process file, see IDEA logs for more information.
Error: Illegal character in: parent.child

We like using this plugin. Are there any plans to fix this issue in the future?

benwatson528 commented 2 years ago

Hello, please see https://github.com/benwatson528/intellij-avro-parquet-plugin/issues/79. I'd love to be able to use the PyArrow library to read Parquet files instead of parquet-avro but last time I checked there wasn't a Java API.

Regards,

Ben