cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.51k stars 3.7k forks source link

sql: support postgresql foreign data wrapper #20249

Open jensneuse opened 6 years ago

jensneuse commented 6 years ago

We'd like to build a large scale CMS with postgresql as source of truth + cockroachdb as denormalized data store for public access.

To keep both systems in sync via transactions I'd like to use a foreign data wrapper.

As per the documentation of postgresql foreign data wrapper module postgresql executes a few commands on client connection startup such as 'set timezone' etc.. (see documentation: https://www.postgresql.org/docs/9.4/static/postgres-fdw.html => F.32.5.)

These sql commands are currently not supported by cockroachdb. A possible workaround could be to use pg_notify to inform an external application to sync changes. However we'd lose transactions this way which is a blocker.

Jira issue: CRDB-5937

gz#20998

dianasaur323 commented 6 years ago

@jensneuse Thanks for the feature request. We are currently planning for the next release, so I'll update this issue once we decide how to prioritize this.

That being said, is there a reason why you don't want to just use CockroachDB for both of these use cases? Secondly, what is your opinion on supporting some form of streaming replication set up?

jensneuse commented 6 years ago

@dianasaur323 Thanks for getting back. I'm looking forward to hearing from you about your release planning.

In regard to your questions, yes there are reasons why we don't want to use crdb as source of truth. Our CMS requires some complex querying (e.g. full text search) on a few million rows. We try to not rely on non transactional db systems like Elasticsearch. I saw that someone mentioned fulltext search support could be implemented using bleve but I guess this features is not prioritised currently.

Secondly, I don't think streaming replication would help in this particular scenario. We're going to heavily denormalise data before storing in crdb. Crdb seems perfect for this use case because we can update "denormalised published content" across many rows in transactions. That being said, I'd be super interested in some form of "change replication to message queue" in a distributed fashion. So to say, as crdb is masterless, every node could stream its changes to a message queue like NATS to achieve near realtime updates without relying on pg_notify e.g. .

dianasaur323 commented 6 years ago

@jensneuse Got it, thanks for your feedback on foreign data wrappers. I haven't spent too much time looking into it yet, although I've been told by others that this could also help with users trying to migrate from other databases to CockroachDB as well. I'll look into it further. For my knowledge, how large do you anticipate your transactions to be when updating CRDB with denormalized data (how many rows do you expect to insert / update)?

jordanlewis commented 6 years ago

@jensneuse we already support timezone and datestyle - we're missing only intervalstyle. It should be pretty straightforward to add this, but I wouldn't be surprised if there's more work to do after that is added.

jensneuse commented 6 years ago

@jordanlewis You are right but this doesn't resolve the problem. The problem is the syntax. Crdb expects 'time zone' whereas postgres fdw tries to connect using 'timezone'. Maybe you're right and the complex part begins when these sql statements go through.

[2017-11-26 16:04:27] [XX000] ERROR: unknown variable: "timezone" [2017-11-26 16:04:27] Where: Remote SQL command: SET timezone = 'UTC'

@dianasaur323 I expect to batch update a few hundred rows.

jordanlewis commented 6 years ago

We'll probably have to write our own foreign data wrapper. The default Postgres one uses PL/pgSQL cursors (DECLARE CURSOR FOR) which we won't support anytime soon.

jordanlewis commented 6 years ago

Although @jensneuse if you don't need SELECT on the remote server it should work okay.

dianasaur323 commented 6 years ago

@jensneuse Gotcha, that size of update shouldn't be a problem. Thanks!

lgo commented 6 years ago

Any opposition to adding the capability to do SET timezone = ...?

This comes from yet another slight incompatibility I've found while testing ORMs. While it's fine to fix the ORM adapters, this will come up elsewhere where have yet to test or support so I think it would be better to add it.

lgo commented 6 years ago

Ah, disregard that. It looks like the current alpha already supports SET timezone = <...>. In 1.1.x only SET "time zone" = <...> or SET TIME ZONE <...> are supported.

knz commented 5 years ago

I think that FDWs are not the proper approach for this. Instead we probably want to look at #26780.

xvaara commented 3 years ago

Is there any plans for this? I think this requires declare cursor to work. I'd also be insterested using postgres_fdw to talk to cockroachdb.

hermanbanken commented 1 year ago

Are transactions truly a blocker? When editing the data some more latency might be acceptable, while with Change Data Capture feeds you get quite reliable global replication and including the RESOLVED feature, you can be sure the data is up to date up to a certain timestamp.

Google Zanzibar also uses something similar, in which each document stores a timestamp of last edit of the permission model, which allows for local consistency between the document and the rules that hold for that document, without requiring headaches for global replication. Wouldn't this work for a CMS too?

github-actions[bot] commented 3 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!