GlareDB / glaredb

GlareDB: An analytics DBMS for distributed data
https://glaredb.com
GNU Affero General Public License v3.0
679 stars 39 forks source link

Add support for additional object storages like R2 or minio #1818

Open legout opened 1 year ago

legout commented 1 year ago

Description

There are a lot of (cheaper) alternatives to AWS S3, like Cloudflare R2, Backblaze, or self-hosted Minio. Support for additional object storages would be great. For example, in duckdb, or pythons fsspec there is a parameter to define the endpoint_url of the object storage.

djouallah commented 1 year ago

yes please specially R2 !!

eitsupi commented 1 year ago

Hi, I am trying to run a GlareDB server on CI. It would be great if we could also connect to a GCP emulator like https://github.com/fsouza/fake-gcs-server to make integration testing easier.

YuriyGavrilov commented 1 year ago

+1. could be useful with storj decentralized cheap storage.

gruuya commented 1 year ago

It would be great if we could also connect to a GCP emulator like https://github.com/fsouza/fake-gcs-server to make integration testing easier.

@eitsupi this is actually somewhat new and not yet documented adequately, but you should be able to do it if you build from source.

Here's how:

  1. Spin up the fake gcs server; you can use a working example from here, this should result in a fake credentials file /tmp/fake-gcs-creds.json and a bucket (glaredb-test-bucket).
  2. Spin up the server with a RPC bind, and configure it to use the fake gcs server using the location (-l) and option (-o) arguments:
    cargo run --bin glaredb -- server --disable-rpc-auth --rpc-bind 0.0.0.0:6555 -l gs://glaredb-test-bucket -o service_account_path=/tmp/fake-gcs-creds.json
  3. To test it run a local glaredb instance and connect to the server
    $ cargo run --bin glaredb -- --ignore-rpc-auth
    Finished dev [unoptimized + debuginfo] target(s) in 0.69s
     Running `target/debug/glaredb --ignore-rpc-auth`
    GlareDB (v0.5.1)
    Using in-memory catalog
    Type \help for help.
    > \open http://localhost:6555
    Connected to remote GlareDB server: http://localhost:6555/
    > create table test as values (1, 'one'), (2, 'two');
    Table created
    > select * from test;
    ┌─────────┬─────────┐
    │ column1 │ column2 │
    │      ── │ ──      │
    │   Int64 │ Utf8    │
    ╞═════════╪═════════╡
    │       1 │ one     │
    │       2 │ two     │
    └─────────┴─────────┘
  4. Verify the contents in the bucket
    $ curl -s --insecure http://0.0.0.0:4443/storage/v1/b/glaredb-test-bucket/o | jq .
    {
      "kind": "storage#objects",
      "items": [
        {
          "kind": "storage#object",
          "name": "databases/00000000-0000-0000-0000-000000000000/tables/20000/_delta_log/00000000000000000000.json",
          "id": "glaredb-test-bucket/databases/00000000-0000-0000-0000-000000000000/tables/20000/_delta_log/00000000000000000000.json",
          "bucket": "glaredb-test-bucket",
          "size": "1206",
          "crc32c": "3VCvgQ==",
          "md5Hash": "5IB4pt04RVTwx/jdycJc7A==",
          "etag": "\"5IB4pt04RVTwx/jdycJc7A==\"",
          "timeCreated": "2023-10-18T07:48:23.437901Z",
          "updated": "2023-10-18T07:48:23.437943Z",
          "generation": "1697615303438180"
        },
        ...
tychoish commented 7 months ago

@scsmithr just wanted to see where we're sitting with this one. My memory is that there's just documentation issues remaining....