DenoDoc performance - Logbook

mcandeia commented 1 year ago

Logbook of Task: Dynamic Schema Generation and DecoHub Implementation

Task Overview: The task at hand involves improving the dynamic schema generation process, particularly for supporting the DecoHub feature, which allows users to extend the components library with community-built components/blocks without requiring a redeploy. The current schema generation relies on deno doc during development time, but it poses limitations due to Deno Deploy's restrictions on syscalls. Various alternatives have been explored, and this logbook will document those attempts, along with their advantages, disadvantages, and future plans.

Current Schema Generation: Currently, we generate a schema.gen.json file during development time. This is done because:

Schema generation relies on deno doc, which generates a TypeScript "AST" with comments used for parsing code and generating the schema.
Deno Deploy lacks syscall support, preventing the use of Deno CLI commands, and necessitates the use of the slower denodoc WASM (compiled from Rust) for schema generation.

Goals and Their Impacts: There are multiple goals related to dynamic schema generation, but the most significant ones are:

Supporting Dynamic Imports: Allowing users to extend the components library with community-built components without requiring an initial component. It is the foundation for DecoHub, where devs can publish code, and users can install them without a redeploy.
Improving Dev Experience: Enhancing the development experience by addressing schema generation issues and reducing its time consumption.

Alternative Approaches Explored:

1. Switch to WASM with Deno KV Cache: Advantages:

No changes in dev mode.
No need for additional infrastructure.

Disadvantages:

Extremely slow due to no caching mechanism.
No "single flight" capability for schema generation, causing multiple isolates to start generating the schema simultaneously.

2. Shared DenoDoc Server (Rust, gRPC/WebSockets): Advantages:

Shared cache with Content Addressable Storage to avoid redundant denodoc requests.
Usable in other languages, enabling multi-threading.

Disadvantages:

Deno doesn't natively support gRPC, leading to challenges.
Implementing a DenoDoc cache in Rust required reimplementation efforts.
Multi-threading issues in the denodoc crate hindered progress.

3. Deno-based Implementation of Approach 2: Advantages:

Same approach as the Rust server but implemented in Deno.

Disadvantages:

Still slow and memory-intensive due to Deno's single-threaded nature.

4. Go-based Implementation of Approach 2: Advantages:

Familiarity with Go allowed successful implementation.

Disadvantages:

Still slow due to the nature of deno doc time consumption

Revised Approach: Enabling Deco Hub: Considering the challenges faced in previous attempts, the focus will be on enabling Deco Hub. Instead of generating the entire JSON schema, we will generate the denodoc cache and save it as a compressed ZSTD file to minimize size.

Advantages:

Faster version due to only generating the difference between the current cache and the "published lib."
No additional infrastructure required.

Disadvantages:

Dev experience still not optimal.
Runtime generation still needed.

Future Plans: In planning for the future, a potential solution to address the dynamic schema generation and Deco Hub implementation challenges is to set up a separate infrastructure dedicated to generating the schemas. This infrastructure could handle a smaller portion of the overall traffic, perhaps around 10%. By doing so, we can alleviate the performance impact on the main production system and focus on optimizing schema generation on this specialized infrastructure.

To implement this approach, we could deploy the schema generation service within a Kubernetes cluster with appropriate resource allocation and scaling capabilities. To ensure consistency and reduce redundant calculations, we can leverage session stickiness to direct requests to the same "server" within the cluster, allowing the cache to remain fresh and reusable. This stickiness will enable us to take advantage of the cached schema data efficiently while minimizing redundant computations.

This dedicated schema generation infrastructure would provide a controlled environment, allowing us to experiment with different caching mechanisms, optimizations, and multi-threading techniques without impacting the primary production environment. We can continuously fine-tune the schema generation process to achieve maximum efficiency, reduced latency, and an overall improved development experience.

mcandeia commented 1 year ago

Go implementation: https://github.com/mcandeia/denodoc/tree/main Deno implementation: https://github.com/deco-cx/denodoc

mcandeia commented 1 year ago

The story never ends: https://github.com/deco-cx/deco/pull/377

deco-cx / deco

DenoDoc performance - Logbook #357