benwatson528 / intellij-avro-parquet-plugin

A Tool Window plugin for IntelliJ that displays Avro and Parquet files and their schemas in JSON.
Apache License 2.0
44 stars 8 forks source link

parquet: render fails when field name has `-` in it #68

Closed marengaz closed 3 years ago

marengaz commented 3 years ago

heya - ive run across a problem with a field with a - in the name. its a 'top level' field (not nested in any way)

Screenshot 2021-04-22 at 09 54 14

pycharm pro 2020.3.5 plugin version 2.5.0

Unable to process file /Users/ben.marengo/code/other/wovenlight/data/outputs/features.parquet

org.apache.avro.SchemaParseException: Illegal character in: category_anti-biotics
    at org.apache.avro.Schema.validateName(Schema.java:1566)
    at org.apache.avro.Schema.access$400(Schema.java:91)
    at org.apache.avro.Schema$Field.<init>(Schema.java:546)
    at org.apache.avro.Schema$Field.<init>(Schema.java:585)
    at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:280)
    at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:264)
    at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:134)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
    at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
    at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
    at uk.co.hadoopathome.intellij.viewer.fileformat.ParquetFileReader.getRecords(ParquetFileReader.java:49)
    at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:180)
    at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:171)
    at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
benwatson528 commented 3 years ago

Hello,

This is because hyphens are invalid characters in Avro (this plugin uses org.apache.parquet:parquet-avro:1.12.0 to read Parquet files). See https://avro.apache.org/docs/current/spec.html#names for the full naming rules. I tend to stick to underscores.

Thanks,

Ben

On Thu, 22 Apr 2021, 09:58 marengaz, @.***> wrote:

heya - ive run across a problem with a field with a - in the name. its a 'top level' field (not nested in any way)

[image: Screenshot 2021-04-22 at 09 54 14] https://user-images.githubusercontent.com/6930705/115686039-b4be1780-a350-11eb-9238-466b5da41bfe.png

pycharm pro 2020.3.5 plugin version 2.5.0

Unable to process file /Users/ben.marengo/code/other/wovenlight/data/outputs/features.parquet

org.apache.avro.SchemaParseException: Illegal character in: category_anti-biotics at org.apache.avro.Schema.validateName(Schema.java:1566) at org.apache.avro.Schema.access$400(Schema.java:91) at org.apache.avro.Schema$Field.(Schema.java:546) at org.apache.avro.Schema$Field.(Schema.java:585) at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:280) at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:264) at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:134) at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at uk.co.hadoopathome.intellij.viewer.fileformat.ParquetFileReader.getRecords(ParquetFileReader.java:49) at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:180) at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:171) at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benwatson528/intellij-avro-parquet-plugin/issues/68, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPNI2JSLQQDFKEPWT3NBHDTJ7QKFANCNFSM43MAKGGQ .