dvirtz / vscode-parquet-viewer

A VS Code extension to view Apache Parquet files as JSON
MIT License
29 stars 6 forks source link

ZSTD compression is not supported #90

Closed eriviere-b closed 11 months ago

eriviere-b commented 11 months ago

Currently parquet-viewer does not support ZSTD compression which is a standard compression method.

{"error":"while reading /tmp/SB17-er-ursf_gen-crawls-list.parquet: Error: invalid compression method: ZSTD"}

dvirtz commented 11 months ago

Thanks for opening the issue. What backend do you use (parquet-viewer.backend in the settings)?

eriviere-b commented 11 months ago

I am using the Parquets backend.

uditrana commented 11 months ago

Ran into this issue today as well 👍🏾.

while reading path/to/file: Error: Failed to open path/to/file: Support for codec 'zstd' not built

Using arrow backend

dvirtz commented 11 months ago

:tada: This issue has been resolved in version 2.4.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

dvirtz commented 11 months ago

This is now fixed for the arrow backend.

eriviere-b commented 11 months ago

When trying to open the Parquet file with the arrow backend, I receive this error: "Error: cannot find prebuilt arrow module, either build the module or use another backend" I am on macOS with Apple Silicon M1 processor, how am I supposed to build the arrow module ?

dvirtz commented 11 months ago

Sorry about that. There's no free M1 VMs currently available for GitHub actions. You can either try to use the parquet-tools backend or try to build the module as follows:

  1. make sure you have node.js, pipenv and a C++ compiler installed
  2. checkout the extension sources
  3. run npm i
  4. run npm run build
  5. copy the resulting module folder from packages/parquet-reader/prebuilds to <extension folder>/packages/parquet-reader/prebuilds (you can get the extension folder using the Extensions: Open Extensions Folder command)
dvirtz commented 10 months ago

@eriviere-b the latest release has built-in support for Apple M1. I'd be happy to get your feedback

eriviere-b commented 10 months ago

@dvirtz for each engine:

dvirtz commented 10 months ago

Thanks for that detailed feedback @eriviere-b I specifically meant the arrow backend. It shouldn't require having a C++ compiler installed now as the module is prebuilt and packaged with the extension in CI thanks to codemagic.io supporting M1.

eriviere-b commented 10 months ago

I still receive the same error: "Error: cannot find prebuilt arrow module, either build the module or use another backend"

dvirtz commented 10 months ago

Just to make sure you're on version v2.4.1, right?

eriviere-b commented 10 months ago

Yes, I am loading the latest version of the extension every time I try.

image
dvirtz commented 10 months ago

Thanks. Can you please tell me what is printed when you run

node -e 'const os = require(\"os\"); console.log(`${os.platform()}-${os.arch()}`)'
eriviere-b commented 10 months ago

darwin-arm64:

image
dvirtz commented 10 months ago

That's what expected. I released a new version v2.4.2 with some more logging if you don't mind trying. Also if you can turn on the logging to panel option (parquet-viewer.logging.panel) and paste the content of the parquet-viewer output window here. Thanks for your patient.

melaanya commented 10 months ago

experienced the same problem today as @eriviere-b being on 2.4.2

dvirtz commented 10 months ago

I managed to reproduce this on a friend's M1 machine. The error is:

dlopen(/Users/mgunda@roku.com/.vscode/extensions/dvirtz.parquet-viewer-2.4.2/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node, 0x0001): tried: '/Users/mgunda@roku.com/.vscode/extensions/dvirtz.parquet-viewer-2.4.2/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (not a mach-o file)

Not sure how it works on the CI machine.

dvirtz commented 10 months ago

M1 should be fixed with v2.4.3.

melaanya commented 10 months ago

I have an M1, and on v2.4.3 currently (just updated and reloaded) and the issue persists:

{"error":"while reading /Users/annaberger/Downloads/data.parquet: Error: cannot find prebuilt arrow module, either build the module or use another backend: Error: dlopen(/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node, 0x0001): tried: '/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (no such file), '/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (not a mach-o file)"}
dvirtz commented 10 months ago

Sorry for that, the fix was only integrated in v2.4.4.

eriviere-b commented 10 months ago

Now it works with the arrow engine. Thanks @dvirtz !