golemcloud / golem

Golem is an open source durable computing platform that makes it easy to build and deploy highly reliable distributed systems.
https://learn.golem.cloud/
Apache License 2.0
530 stars 59 forks source link

Implement wasm-rdbms interface for standardized database access #1016

Open jdegoes opened 1 month ago

jdegoes commented 1 month ago

WASM RDBMS

WASM components do not have access to databases natively. Rather, they must obtain all functionality from their host, through the component model types.

Although there exists a wasi-sql project, which intends to provide a WASI standard for database access, it is still in its infancy and shows no signs of active development.

Rather than await the completion of wasi-sql, WASM startup Fermyon introduced a series of WIT interfaces postgres, sqlite, and mysql in their WASM-native framework Spin, which use a related series of database types. These interfaces are supported by the Spin host environment.

It is not clear if Fermyon’s interfaces will become standard or not, or if work will be resumed on the earlier (but far less complete) wasi-sql. Rather than wait for the lengthy process of standardization to produce something useful, Golem needs to support some level of database access now, so that developers using databases have options.

This specification calls for the development of a new wasm-rdbms interface, which is designed to support features common to mainstream databases, as well as a corresponding set of implementations in Golem, powered by a low-level Rust library.

Overview

The basic approach of this language-agnostic interface to databases is to have a generalized core upon which all specific database interfaces can depend.

This allows new databases to be introduced cheaply, while still respecting the fact that developers must necessarily program against specific databases, due to differences in column types, SQL syntax, and other features.

Common Elements

Common elements belong to the same generic package and have generic interface names.

They include the following, as well as related supporting types:

Unique Elements

Unique elements belong to database-specific packages, with database-specific interface names. They represent aspects of a database interface that benefit from tying to one specific database.

WIT Sketch

Common Elements

Something like the following should work for most databases (other than SQLite):

type node-index = u32;
type table-name = string;

/// Node in the db-data-type structure
variant db-column-type-primitive{
    integer(option<u8>),
    decimal(u8, u8),
    float,
    boolean,
    datetime,
    interval,
    string(option<u32>),
    text,
    binary(option<u32>),
    blob,
    enum(list<string>),
    json,
    xml,
    uuid,
    spatial,
}

variant db-column-type {
    primitive(db-column-type-primitive),
    array(list<option<u32>>, db-column-type),
}

flags db-column-type-flags {
    // Key constraints
    primary-key,     // Single or part of composite primary key
    foreign-key,     // References another table's key
    unique,          // Unique constraint

    // Core properties
    nullable,        // If not present, column is NOT NULL
    generated,       // Value is computed rather than stored directly
    auto-increment,  // MySQL AUTO_INCREMENT/PostgreSQL SERIAL/SQL Server IDENTITY

    // Common behaviors
    default-value,   // Has a default value defined
    indexed,         // Participates in an index
}

/// Data types for database columns
record db-column-type-meta {
    db-type:     db-column-type,
    name:        string,
    flags:       db-column-type-flags,
    foreign-key: option<table-name>,
}

/// Value descriptor for a single database value
variant db-value-primitive {
    integer(s64),
    decimal(string),
    float(float64),
    boolean(bool),
    datetime(u64),
    interval(u64),
    string(string),
    text(string),
    binary(list<u8>),
    blob(list<u8>),
    enum(u32),
    json(string),
    xml(string),
    uuid(u128),
    spatial(list<float64>),
    other(string, list<u8>),
    db-null,
}

variant db-value {
    primitive(db-value-primitive),
    array(list<db-value-primitive>), // Flatteneed
}

/// A single row of values
record db-row {
    values: list<db-value>,
}

/// A potentially very large and lazy stream of rows:
resource db-result-set {
    get-column-metadata: func() -> list<db-column-type-meta>;
    get-next: func() -> option<list<db-row>>;
}

Unique Elements

Something like the following should work for databases that are not SQLite:

interface postgres {
  resource db-connection {
    open: static func(address: string) -> result<db-connection, error>;

    query: func(statement: string, params: list<db-value>) -> result<db-result-set, error>;

    execute: func(statement: string, params: list<db-value>) -> result<u64, error>;
  }
}

interface mysql {
  resource db-connection {
    open: static func(address: string) -> result<db-connection, error>;

    query: func(statement: string, params: list<db-value>) -> result<db-result-set, error>;

    execute: func(statement: string, params: list<db-value>) -> result<u64, error>;
  }
}

SQLite will require its own interface, perhaps inspired by the one in Spin.

Host Implementation

The host implementation will live in the worker-executor service and will use best-in-class async Rust libraries to provide implementations of the specified interfaces.

One slight complication is that a connection in the guest must not map to a connection in the host. The reason for this is that on a single worker-executor node, there may exist thousands of workers, and these workers should not maintain their own independent connections to the database.

Therefore, in order to improve performance, it is necessary to perform connection pooling in the host, so that if only 20 concurrent connections are needed across all workers, then only 20 concurrent connections are created and maintained.

Note that if the wasm-rdbms interface improves in power to allow interleaving database updates with side-effects as part of a single database transaction, then pooling becomes much more difficult. However, with the current interface, the only two things a worker can do with a connection are query and execute, which atomically return results; and if those results are small enough to fit in memory, then there is no need to keep an underlying connection dedicated to the worker (instead, it can be reused for another worker).

jdegoes commented 2 weeks ago

/bounty $5000

algora-pbc[bot] commented 2 weeks ago

💎 $5,000 bounty • Golem Cloud

Steps to solve:

  1. Start working: Comment /attempt #1016 with your implementation plan
  2. Submit work: Create a pull request including /claim #1016 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to golemcloud/golem!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @justcoon Nov 6, 2024, 3:01:20 PM WIP
justcoon commented 2 weeks ago

/attempt #1016

palash25 commented 2 weeks ago

The host implementation will live in the worker-executor service

I am assuming you mean golem-worker-executor-base directory because golem-worker-executor simply starts up the server with all the services defined in the former directory

and will the .wit files reside in this repo https://github.com/golemcloud/golem-wit/blob/main/wit ?

jdegoes commented 2 weeks ago

@palash25 Please don't work on this one, it's already taken. I will have new bounties up soon.

debaa98 commented 1 week ago

hey @justcoon are you working on it ?

justcoon commented 1 week ago

hi @debaa98 , yes I am working on it

volfcan commented 1 week ago

@palash25 Please don't work on this one, it's already taken. I will have new bounties up soon.

only one person can attempt it at the same time?

jdegoes commented 1 week ago

@volfcan Yes, please! But be on the look out for many more bounties in the next 2 months.