benwatson528 / intellij-avro-parquet-plugin

A Tool Window plugin for IntelliJ that displays Avro and Parquet files and their schemas in JSON.
Apache License 2.0
44 stars 8 forks source link

Zstd compressed files can't be viewed #48

Closed unoexperto closed 3 years ago

unoexperto commented 3 years ago

I get following exception

java.lang.NoClassDefFoundError: com/github/luben/zstd/ZstdInputStream
    at org.apache.avro.file.ZstandardCodec.decompress(ZstandardCodec.java:78)
    at org.apache.avro.file.DataFileStream$DataBlock.decompressUsing(DataFileStream.java:379)
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:213)
    at uk.co.hadoopathome.intellij.viewer.fileformat.AvroFileReader.getRecords(AvroFileReader.java:34)
    at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:180)
    at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:171)
    at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

I fixed it by putting zstd-jni-1.4.4-7.jar into ~/.local/share/JetBrains/IntelliJIdea2020.2/intellij-avro-parquet-viewer/lib/ but it would be nice if it's bundled into release build.

Thanks a lot !!

benwatson528 commented 3 years ago

Thanks for the suggestion, I'll be adding zstd-jni-1.4.5-12.jar to the build over the next few days. It would be greatly appreciated if you were able to share a small sample file, but I know this isn't always possible.

benwatson528 commented 3 years ago

I'm not having much luck getting this to work in Windows - the Avro and Parquet APIs are flimsy at the best of times and there's not much out there on Zstd support. I will take another stab this weekend but for the time being I may just have to leave instructions for how to manually load the jar like you suggested.

benwatson528 commented 3 years ago

I'm not going to be fixing this in the short term - I haven't been able to get it working on Windows. As I mentioned in my last message, the APIs are difficult enough to work with for basic functionality, never mind newer compressions such as this. I will update documentation to let people know about your workaround, but without an external patch then I won't be able to fix this.

benwatson528 commented 3 years ago

@unoexperto please can you retry with just the latest version of the plugin and no other changes? There has been lots of work around ZSTD in parquet-mr:1.12.0, so I'm hoping it works now.

benwatson528 commented 3 years ago

Using just the latest version of the plugin I was able to read a file generated via:

import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa

parquetFilename = "test.parquet"

df = pd.DataFrame(
    {
        "num_legs": [2, 4, 8, 0],
        "num_wings": [2, 0, 0, 0],
        "num_specimen_seen": [10, 2, 1, 8],
    },
    index=["falcon", "dog", "spider", "fish"],
)

df = pa.Table.from_pandas(df)
pq.write_table(df, parquetFilename, compression="zstd")

so I'll close this ticket; let me know if you're still experiencing issues.

unoexperto commented 3 years ago

@benwatson528 It works! Thank you!