dvirtz / vscode-parquet-viewer

A VS Code extension to view Apache Parquet files as JSON
MIT License
29 stars 6 forks source link

Main Visual Studio Marketplace Version

parquet-viewer

Views Apache Parquet files as text (JSON or CSV).

Features

When opening a Parquet file, a textual presentation of the file will open automatically:

automatic

After closing the textual view, it is possible to reopen it by clicking on the link in the parquet view.

command

Backends

The extension supports different backends for parsing the files:

arrow

This is the default backend. This backend is a thin wrapper around the Apache Arrow C++ implementation and so should support latest and greatest parquet features.

parquet-wasm

This backend uses the parquet-wasm library which uses the "official" Rust implementations of Arrow and Parquet.

It support most compression algorithms besides LZ4, see https://kylebarron.dev/parquet-wasm/index.html#md:compression-support for details.

parquets

This backend uses the parquets TypeScript library, which is a fork of the unmaintained kbajalc/parquets library with some bug fixes.

It only supports parquet version 1.0.0 with snappy compression.

parquet-tools

This is a legacy Java backend, using parquet-tools. To use that, you should set parquet-viewer.backend to parquet-tools and paruqet-tools should be in your PATH, or pointed by the parquet-viewer.parquetToolsPath setting.

Format

The textual output can be either JSON or CSV based on the parquet-viewer.format setting.

A richer view

After getting the textual representation, it's possible to use other extensions like JSON Table Viewer or Edit csv to get a richer view of the data (e.g. in a table).

csv-as-table

Settings

settings

The following setting options are available:

name default description
parquet-viewer.backend parquets Which backend to use for reading the files
parquet-viewer.format json textual output format
parquet-viewer.logging.panel false Whether to write diagnostic logs to an output panel
parquet-viewer.logging.folder empty Write diagnostic logs under the given directory
parquet-viewer.logging.level info Diagnostic log level. Choose between: off, fatal, error, warn, info, debug or trace
parquet-viewer.parquetToolsPath parquet-tools The name of the parquet-tools executable or a path to the parquet-tools jar
parquet-viewer.json.space 0 JSON indentation space, passed to JSON.stringify as is, see mdn for details
parquet-viewer.json.asArray false Whether to format output JSON as one big array
parquet-viewer.csv.separator ', ' CSV separator

Notes

Size limit

VSCode allows extensions to work on files smaller than 50MB. If the data is larger, it will be truncated a message indicating that will be appended to the output. See https://github.com/microsoft/vscode/issues/31078 for details.

What's new

See CHANGELOG.md