dapr / components-contrib

Community driven, reusable components for distributed apps
Apache License 2.0
542 stars 469 forks source link

Support YugabyteDB as a State Store #2529

Open ashetkar opened 1 year ago

ashetkar commented 1 year ago

Describe the feature

Support YugabyteDB as a State Store.

About YugabyteDB: YugabyteDB is a transactional distributed SQL database for cloud-native applications. It is wire-compatible with Postgresql. More details here.

Release Note

RELEASE NOTE: ADD support for YugabyteDB as a state store in Dapr.

berndverst commented 1 year ago

If it is wire compatible with Postgresql, can you not simply use the Postgres state store and point it at YugabyteDB? That would be preferred @ashetkar

berndverst commented 1 year ago

According to the docs the GO library lib/pq works to connect. We use jackc/pgx which supports the same required authentication method.

For this reason I do not think a dedicated Dapr component is required to connect to YugabyteDB and we also do not need to update our documentation.

Please just use the Postgres state store.

ashetkar commented 1 year ago

@berndverst Thank you for the response.

The Postgresql component of Dapr makes use of xmin column for etags and YugabyteDB does not support xmin, hence it cannot use the Postgresql component to connect. And there may be other incompatibilities existing. Also, we may be able to add YugabyteDB-specific optimisations like using our fork of jackc/pgx driver, etc.

Even though this issue was filed to request the support, it was to be followed up with a PR from us that we have already been working on. And we would like to work with Dapr team to complete this support.

Given this, please consider retaining this GH Issue. Do let me know if you need more details, thanks.

artursouza commented 1 year ago

I am reopening the issue for further discussion.

How do you solve the etag in the new component? What are the other incompatibilities? I am wondering if there is a way to make the existing Postgres component configurable and work for YugabyteDB - I think the community can benefit from that - we could even add YugabyteDB to our conformance tests just like we use multiple brokers for MQTT conformance tests.

Now, if you have a fork of the jackc/pgx driver that you would like to use, have you considered making this component as a pluggable component instead? We do have a go-sdk that make this simple if the component is written with the same interface as components-contrib: http://github.com/dapr-sandbox/components-go-sdk

berndverst commented 1 year ago

@ashetkar I would prefer if we add whatever capabilities are needed to the existing Postgres state store.

@ItalyPaleAle created the TTL implementation. We can certainly change it, or conditionally enable / disable it to create compatibility.

Duplicating the Postgres libraries in our code means more memory utilization for something which isn't strictly necessary. What optimizations are there which Dapr would realistically leverage?

If you must create a new component I strongly suggest using the version of pgx we are already importing.

berndverst commented 1 year ago

I agree with @artursouza if you must use the fork of your driver then this should be a pluggable component (a binary external to the sidecar which communicates with the sidecar).

Importing a fork of pgx specifically for Yugabyte is a dealbreaker for inclusion in the binary directly. But a pluggable component would solve this.

I still lean towards just making the existing Postgres component more flexible and not introducing a new component here. After all the goal is to be able to connect to Yugabyte. The component name does not matter.

ItalyPaleAle commented 1 year ago

xmin is a built-in column in Postgres that indicates the version, and it's used for etags in Dapr: https://www.postgresql.org/docs/current/ddl-system-columns.html

I'm happy to consider a switch that uses a different column for etags for YugabyteDB. You may need to manually generate a new etag (e.g. a UUID) before inserting or updating a column, but it would work (please make sure that Postgres can continue using the buillt-in xmin).

Agree with @berndverst that importing a fork of pgx is not an option I'm afraid. pgx is very actively-maintained, and we don't want to depend on a fork which will lag behind.

ashetkar commented 1 year ago

Thank you @artursouza for reopening the ticket for discussion and everyone for insightful comments.

Let me take some time to understand the suggestions (like pluggable component among others) and get back to you with answers/ further comments.

berndverst commented 1 year ago

@ashetkar maybe you can contribute the performance improvements you need directly to pgx and then we can just use the latest pgx version. That benefits everyone!

ItalyPaleAle commented 1 year ago

@ashetkar can you please try with the CockroachDB component and see if that works with YugabyteDB too? I have a feeling it might (and we're thinking of merging that into the Postgres component)

ItalyPaleAle commented 1 year ago

@ashetkar were you able to check if the CockroachDB component works with Yugabyte? If so, we can just update the docs to indicate that :)

ashetkar commented 1 year ago

Thank you everyone for the suggestions and comments.

Using the CockroachDB component to connect to YugabyteDB works fine as far as the xmin issue is concerned. But having a separate component for YugabyteDB will help for below reasons:

And we'd like to maintain the YugabyteDB component for Dapr as we do the fork of pgx driver.

ItalyPaleAle commented 1 year ago

@ashetkar thanks for sharing the links.

I understand that YugabyteDB and CockroachDB have some differences, but the links there seem to show that they are more about internals and less about things that would require code changes in Dapr.

As for the pgx driver, what are the code diffs? Would we potentially be able to use your forked pgx driver for Postgres and CockroachDB too? Reason I'm asking is because pgx is a fairly large library, and adding a second "version" of that will increase the size and memory requirements of Dapr probably a lot. This has bitten us badly when we were releasing Dapr 1.10, see: https://github.com/dapr/dapr/issues/6066

ashetkar commented 1 year ago

@ItalyPaleAle I don't have enough understanding of Dapr to comment on which features of DB it uses and how. For example, the assumptions it has about transactions from the underlying database may differ from one DB to another.

Agree that another pgx fork may add to Dapr's size. It was brought up as an example of what customisations the YugabyteDB component can have.

ItalyPaleAle commented 1 year ago

@ashetkar we really don't do much with databases besides basic CRUD and some transactions. Case in point: we have a SQLite component that is pretty much identical to Postgres besides the library used.

How we guarantee that the components work across multiple variants is through our extensive tests, such as certification and conformance. If you want to enable those for YugabyteDB, we should be able to tell if something is missing (I doubt it).

ashetkar commented 1 year ago

@ItalyPaleAle I was able to run conformance tests and those ran fine. How do we enable/run the other extensive tests for YugabyteDB?

Would we potentially be able to use your forked pgx driver for Postgres and CockroachDB too?

The changes are specific to YugabyteDB but are incorporated based on some flags, so it could be possible.

ItalyPaleAle commented 1 year ago

Thanks for testing this @ashetkar !

Let me sync offline with @berndverst and the other maintainers on how to best expose this to end users, if there's something we can do to still have the name "YugabyteDB" show up in Dapr (even without adding code), as I'm sure that's important for you :) I'll have an answer for you tomorrow!

As for conformance/certification test, how that is actually implements depends a bit on how the component is exposed. I'll help you when I have clarity on this.

rahulpoddar-fyndna commented 1 year ago

Hello @ItalyPaleAle , Can you please help as what is the next step from our end.

ItalyPaleAle commented 1 year ago

@ashetkar @rahulpoddar-fyndna sorry for the delay, following up on this now.

To recap the current state of affairs:

ashetkar commented 1 year ago

However you are saying that it would be best to use a fork of pgx for YugabyteDB

@ItalyPaleAle My apology if I was not clear earlier. I had mentioned the use of our fork of pgx only as an example of how YugabyteDB as a Dapr state store could be further differentiated. It is no way a pre-requisite for the state store implementation as of today.

You mentioned that you were upstreaming your changes to the official pgx. What's the status of that?

Yes, I had stated that it could be possible to upstream the YugabyteDB-specific changes to official pgx. But we have no plans for that as of today since it involves collaborating with pgx maintainers and also, not all changes - including the ones which might come in future - may be incorporated in the upstream pgx.

berndverst commented 12 months ago

@ashetkar @rahulpoddar-fyndna can you please test whether the Dapr CockroachDB component we have today works with YugaByteDB as is? Don't worry about the name for now - we can add the YugaByteDB name in such a way that both names will load the same underlying component implementation.

What we want to know from you (and are asking you to verify by trying this out) whether the Dapr CockroachDB component works with YugaByteDB out of the box.

ItalyPaleAle commented 9 months ago

We will have Yugabyte as alias of PostgreSQL in 1.13. Will be possible with the v2 of the component: #3250

However, would appreciate if you could help us set up a conformance test for this!

ashetkar commented 9 months ago

Thanks @ItalyPaleAle!

However, would appreciate if you could help us set up a conformance test for this!

Happy to help, please let us know what is needed from us

ItalyPaleAle commented 9 months ago

@ashetkar is there a Docker compose file we can use to run Yugabyte and a sample Component YAML? See examples for CockroachDB: