finos / perspective

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
https://perspective.finos.org/
Apache License 2.0
7.72k stars 1.04k forks source link

Perspective Virtual API (JavaScript) #2615

Closed texodus closed 1 month ago

texodus commented 1 month ago

Summary

This PR introduces a radical new design for Perspective's client/server API. The basis of this work will enable the next version of Perspective UX to directly support streaming (* where supported), virtual access to non-Perspective databases such as [DuckDB](), SQLite and [Postgres]() without copying the entire dataset into either the Browser or perspective-python, using an efficient Virtual API.

The ad-hoc, JSON-based wire format of the 2.x series has been re-written as a set of [Protobuf]() messages. This enables easier portability of this message protocol to new host languages, decreases message handling overhead on the server, and makes possible improved multi-thread utilization on platform which support multi-threading - e.g. in Python, where messages were previously dispatched with the GIL acquired, they can now be parsed and handled entirely on an internal thread pool.

In 2.x, Perspective's client, session manager, message processing loop, multiplexing, etc. was implemented in the domain language itself, and the native C++ API resembled a lower-level version of this public API. This resulted in a lot of duplicate (and subtle-y buggy) code, inconsistencies in implementation and performance, and made it difficult to add new features as they had to be custom-embedded in Python and Javascript. We had exported [over 100]() symbols from the Emscripten+Embind JavaScript API in C++, and had over 1,000 LoC of C++ in Python for PyBind. It also limited concurrent throughput in language like Python that have a GIL associated with interpreted evaluation.

In 3.0, the Server API is implement entirely in C++ and exports only 2 methods, both of which take only [uint8_t] arguments (the serialized Protobuf engine messages), and subsumes session management, client IDs, multiplexing and the lot. The new Client API is written in purely in Rust, and need only emit and consume this duplex binary message stream in order to communicate with a Perspective Server over any transport. As the Rust ecosystem has exceptional Python (PyO3) and JavaScript (wasm-bindgen) bindings, we can mostly get away with transparently wrapping this common Client library for these languages. This drastically decreases the volume of code we must write to expose Perspective's API to new language, simplifying the maintenance of these language bindings.

The following new crates are introduced:

API Changes

There are a number of API changes. In JavaScript:

Performance

Linear performance is not the point of this change, nevertheless the benchmarks track a ~15% improvement for the JavaScript (linear) suite.

Screenshot 2024-05-11 at 2 43 25 AM

Other project improvements

The documentation content, build process, format, and publication platform have also been updated. Previously, Perspective's docs were built to Markdown via an arcane amalgam of [sphinx]() and [jsdoc]() and then built as [docusaurus]() artifacts. While much of this content has been preserved, it has been mostly moved into the Rustdoc annotations in the API code itself, which allows us to use cargo docs to build the docs site. While this eliminates the language-specific API docs in favor of one unified cross-language (but Rust-centric) doc, the new API is much more consistent between languages.

Perspective's benchmarks have been updated to take advantage of the new API modularity as well, and we can now run the same benchmark suite with the same client across multiple different language & runtime implementations at once, giving us true apples-to-apples performance across feature, version, number of cores, size of dataset, and platform.