GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.29k stars 308 forks source link

Implement an RDS kvbackend #4404

Open WenyXu opened 3 months ago

WenyXu commented 3 months ago

What problem does the new feature solve?

Currently, all metadata is stored in etcd. If we implement an RDS key-value backend, we can allow users to deploy a GreptimeDB cluster without relying on etcd.

What does the feature do?

Implement an kvbackend using RDS(e.g., MySQL, Postgres) .

Implementation challenges

It's a good idea to start by implementing an RDS-based KvBackend:

https://github.com/GreptimeTeam/greptimedb/blob/2ae2a6674e8a780cb93723de9b749b65ea28562e/src/common/meta/src/kv_backend.rs#L40-L155

killme2008 commented 3 months ago

@WenyXu Needs more detailed info for developers. For example, the key component or code segment etc.

lyang24 commented 3 months ago

This issue is very interesting may i work on this one? may i use this client https://crates.io/crates/aws-sdk-rds

WenyXu commented 3 months ago

This issue is very interesting may i work on this one?

Of course, would you like simply describe your idea or plan?

may i use this client https://crates.io/crates/aws-sdk-rds

I think it's better not rely on a particular RDS client.

lyang24 commented 3 months ago

This issue is very interesting may i work on this one?

Of course, would you like simply describe your idea or plan?

My idea is very naive, lets use postgres impl as an example: I want to take connection parameters from the MetasrvOptions and implement the kvBackEnd interface by translating the calls into postgres CRUD sql queries backed by a postgres table.

WenyXu commented 3 months ago

My idea is very naive, lets use postgres impl as an example: I want to take connection parameters from the MetasrvOptions and implement the kvBackEnd interface by translating the calls into postgres CRUD sql queries backed by a postgres table.

Cool. BTW, this feature could be an epic; you may need to create multiple PRs to implement it and a tracking issue to track it. 👀

lyang24 commented 3 months ago

I realized we have to implement DistLock trait as well, I wonder how often are lock and unlock getting called i have two options for implement locking on RDS which one do you prefer?

normalized

key string, value bytes, lock owner string, lock expire timestamp

denormalized kv table

key string, value bytes

lock table

key string, lock owner string, lock expire timestamp

fengjiachun commented 3 months ago

Currently, DistLock is not being used in practice. I think we can hold off on implementing it for now. What do you think? @WenyXu

WenyXu commented 3 months ago

Currently, DistLock is not being used in practice. I think we can hold off on implementing it for now. What do you think? @WenyXu

Yes, maybe we can implement an RDS kvbackend first.

lyang24 commented 3 months ago

Currently, DistLock is not being used in practice. I think we can hold off on implementing it for now. What do you think? @WenyXu

Yes, maybe we can implement an RDS kvbackend first.

Thank you for confirming. I wrote some early skeleton code. If this is the approach you wanted to see I can continue if not let me know i can adjust. https://github.com/GreptimeTeam/greptimedb/pull/4421

WenyXu commented 3 months ago

Thank you for confirming. I wrote some early skeleton code. If this is the approach you wanted to see I can continue if not let me know i can adjust.

LGTM, let's keep going on 🚀

lyang24 commented 3 months ago

Hey @WenyXu i have a postgres impl ready. Do you know what would be a good place to add integration tests? Also I wonder do we have benchmarks for the metasrv? I would like to modify that to include postgres as well. https://github.com/GreptimeTeam/greptimedb/pull/4421

WenyXu commented 3 months ago

Hey @WenyXu i have a postgres impl ready. Do you know what would be a good place to add integration tests?

We can set up a Postgres in GitHub Actions. BTW, For kv backend's integration tests, we have some test suites to ensure that all implementation has the same behaviors. https://github.com/GreptimeTeam/greptimedb/blob/b298b35b3bd7d06379030bdb773dcb3f9a26d7f6/src/common/meta/src/kv_backend/etcd.rs#L568-L575

Also I wonder do we have benchmarks for the metasrv? I would like to modify that to include postgres as well. #4421

We don't have the benchmarks for the metasrv for now.

lyang24 commented 3 months ago

Hey the postgres kvBackEnd is very close to complete https://github.com/GreptimeTeam/greptimedb/pull/4421 i just have two small questions:

  1. the \0 in the kv encodings in test for an example we concat '\0' in here when prefix is not null this will cause the rust postgres driver to error out see. Is is ok for me to process key value with removing the 0 bytes in vec ? or this is uncommon and i can refactor this test?

  2. for range delete if key is b"\0" and range_end is also b"\0" does that means delete everything from the table?

WenyXu commented 3 months ago

Is is ok for me to process key value with removing the 0 bytes in vec?

Yes, I agree with you, For now, we can carefully remove the tailing \0. However, in the future, we'd be better off refactoring the RangeRequest to something like

struct RangeRequest {
    key: Option<Vec<u8>>,
    range_end: Option<Vec<u8>>,
    ...
}

or

enum RangeRequest{ 
    KeyRange(...)
    SIngleKey(...)
}

cc @fengjiachun

  1. for range delete if key is b"\0" and range_end is also b"\0" does that means delete everything from the table?

Yes, we are following the etcd's behaviors https://etcd.io/docs/v3.3/learning/api/#key-value-pair

lyang24 commented 2 months ago

Is is ok for me to process key value with removing the 0 bytes in vec?

Yes, I agree with you, For now, we can carefully remove the tailing \0. However, in the future, we'd be better off refactoring the RangeRequest to something like

struct RangeRequest {
    key: Option<Vec<u8>>,
    range_end: Option<Vec<u8>>,
    ...
}

or

enum RangeRequest{ 
    KeyRange(...)
    SIngleKey(...)
}

cc @fengjiachun

  1. for range delete if key is b"\0" and range_end is also b"\0" does that means delete everything from the table?

Yes, we are following the etcd's behaviors https://etcd.io/docs/v3.3/learning/api/#key-value-pair

Thank you, I fixed the second case on range delete. I do agree with refactoring RangeRequest. However I think the real problem i am facing is this test condition when the test is called with a actual prefix like here this will produce b"range/20" when i inject that into sql query it will look like where k >= 'range/20' in string format but the sql byte array will contain a 0 which will throw off the postgres driver. Refactoring is great but probably wont help this case, i implement trimming of nulls at the end to bypass this case. All tests passed locally pr is ready for feedback :)