RandomFractals / vscode-data-preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
https://marketplace.visualstudio.com/items?itemName=RandomFractalsInc.vscode-data-preview
Apache License 2.0
541 stars 58 forks source link

invalid encoding: PLAIN_DICTIONARY #310

Open EugeniuZ opened 1 year ago

EugeniuZ commented 1 year ago

Hi,

The extension fails to load the attached parquet file (zipped as github doesn't accept .parquet files). I am able to read the plain file with pandas.

The error in "Runtime Status" is "invalid encoding: PLAIN_DICTIONARY".

Vscode version: 1.70.2 (running on Ubuntu 22.04) Extension version: v2.3.0 FJUL.zip

Regards, Eugeniu

RandomFractals commented 1 year ago

@EugeniuZ Data preview uses this TypeScript library for reading parquet data files:

https://github.com/kbajalc/parquets

At the time when Data Preview was created, it was one of the few libraries available to read parquet files without dependency on Python tools and toolchain.

Quite possible that library doesn't support plain dictionary encoding, as you have it in your parquet files.

New parquet-wasm library looks promising, and in order to resolve this issue, and enable loading of compressed parquet files too, I would need to switch parquet data provider to use better parquet TS/JS library.

RandomFractals commented 1 year ago

more info at: https://github.com/RandomFractals/vscode-data-preview/issues/316#issuecomment-1277766785