benwatson528 / intellij-avro-parquet-plugin

A Tool Window plugin for IntelliJ that displays Avro and Parquet files and their schemas in JSON.
Apache License 2.0
43 stars 9 forks source link

Cannot encode decimal with precision more than 16 #82

Closed sananguliyev closed 2 years ago

sananguliyev commented 2 years ago
Unable to process file /path/to/file.parquet

org.apache.avro.AvroTypeException: Cannot encode decimal with precision 17 as max precision 16 in field field_name
    at org.apache.avro.generic.GenericDatumWriter.addAvroTypeMsg(GenericDatumWriter.java:198)
    at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:231)
    at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:210)
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131)
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
    at uk.co.hadoopathome.intellij.viewer.fileformat.ParquetFileReader.toByteArray(ParquetFileReader.java:132)
    at uk.co.hadoopathome.intellij.viewer.fileformat.ParquetFileReader.getRecords(ParquetFileReader.java:105)
    at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:193)
    at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:184)
    at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 17 as max precision 16
    at org.apache.avro.Conversions$DecimalConversion.validate(Conversions.java:141)
    at org.apache.avro.Conversions$DecimalConversion.toFixed(Conversions.java:105)
    at org.apache.avro.Conversions$DecimalConversion.toFixed(Conversions.java:65)
    at org.apache.avro.Conversions.convertToRawType(Conversions.java:247)
    at org.apache.avro.generic.GenericDatumWriter.convert(GenericDatumWriter.java:107)
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:81)
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
    at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:221)
    ... 14 more
benwatson528 commented 2 years ago

Hello please can you provide a bit more information? What is the schema for field_name?

sananguliyev commented 2 years ago

Yes sure. It's longitude data and type is DECIMAL(16, 15). The parquet is created by other tool automatically and TBH do not know why 16,15. Is the type issue?

benwatson528 commented 2 years ago

I'm afraid so, the plugin just uses the standard avro-parquet parser to read data, and so it looks like one value has a precision of 17 when the type restricts it to 16. Are you able to read this file using parquet-tools (https://pypi.org/project/parquet-tools/)?

sananguliyev commented 2 years ago

Yes. That's why I created an issue without checking the type of the field. parquet-tools is able to read that parquet file without any issue, even without any warning about the type mismatch.

benwatson528 commented 2 years ago

parquet-tools uses Apache Arrow underneath which is better-supported than avro-parquet these days, but sadly not available in Java and IntelliJ plugins can't be written in Python. If there's a decimal with 17 precision being stored in a field with a limit of 16 then I'd expect an error. I'm afraid this won't be fixed in this plugin unless a new version of avro-parquet fixes it.

sananguliyev commented 2 years ago

Thank you very much for explanation.