Closed marengaz closed 3 years ago
Hello,
This is because hyphens are invalid characters in Avro (this plugin uses org.apache.parquet:parquet-avro:1.12.0 to read Parquet files). See https://avro.apache.org/docs/current/spec.html#names for the full naming rules. I tend to stick to underscores.
Thanks,
Ben
On Thu, 22 Apr 2021, 09:58 marengaz, @.***> wrote:
heya - ive run across a problem with a field with a - in the name. its a 'top level' field (not nested in any way)
[image: Screenshot 2021-04-22 at 09 54 14] https://user-images.githubusercontent.com/6930705/115686039-b4be1780-a350-11eb-9238-466b5da41bfe.png
pycharm pro 2020.3.5 plugin version 2.5.0
Unable to process file /Users/ben.marengo/code/other/wovenlight/data/outputs/features.parquet
org.apache.avro.SchemaParseException: Illegal character in: category_anti-biotics at org.apache.avro.Schema.validateName(Schema.java:1566) at org.apache.avro.Schema.access$400(Schema.java:91) at org.apache.avro.Schema$Field.
(Schema.java:546) at org.apache.avro.Schema$Field. (Schema.java:585) at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:280) at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:264) at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:134) at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at uk.co.hadoopathome.intellij.viewer.fileformat.ParquetFileReader.getRecords(ParquetFileReader.java:49) at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:180) at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:171) at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benwatson528/intellij-avro-parquet-plugin/issues/68, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPNI2JSLQQDFKEPWT3NBHDTJ7QKFANCNFSM43MAKGGQ .
heya - ive run across a problem with a field with a
-
in the name. its a 'top level' field (not nested in any way)pycharm pro 2020.3.5 plugin version 2.5.0