benwatson528 / intellij-avro-parquet-plugin

A Tool Window plugin for IntelliJ that displays Avro and Parquet files and their schemas in JSON.
Apache License 2.0
45 stars 8 forks source link

Exception generated while trying to view parquet files with dash formated uuid's as the column names #94

Closed neil-hucker-seequent closed 2 years ago

neil-hucker-seequent commented 2 years ago

Description: When trying to view a parquet file with the plugin, if I use a file that contains uuid4 column name with dash formatting, the plugin throws the below exception and is unable to load the file.

The file was generated by PyArrow using parquet version 2.6, and zstd compression. Sample file attached.

org.apache.avro.SchemaParseException: Illegal character in: d3d3d3d3-4b40-4d1c-bed1-3a2ed76be83a at org.apache.avro.Schema.validateName(Schema.java:1566) at org.apache.avro.Schema.access$400(Schema.java:91) at org.apache.avro.Schema$Field.(Schema.java:546) at org.apache.avro.Schema$Field.(Schema.java:585) at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:280) at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:264) at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:134) at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at uk.co.hadoopathome.intellij.viewer.fileformat.ParquetFileReader.getRecords(ParquetFileReader.java:99) at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:193) at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:184) at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829 CENPLAT-5827_64blks_ijk_zero_col_Rock_Measure_In_Bool_In.zip )

benwatson528 commented 2 years ago

Hello see https://github.com/benwatson528/intellij-avro-parquet-plugin/issues/68.