MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

integrations: support Hasura #4150

Open JLDLaughlin opened 4 years ago

JLDLaughlin commented 4 years ago

Background

Hasura is a no-code tool that automatically generates secure GraphQL APIs for user data. To generate these APIs from a Postgres database (or Postgres-compliant database, like Materialize), Hasura requires some set of Postgres features and functions to be implemented.

This is intended to be a tracking issue for all things related to Hasura support, and will therefore change over time.

Blockers

To investigate the features and functions we are currently lacking, I did the following:

The following items missing from Materialize are required just to start Hasura, I didn't dig into anything additional in the UI. An example of each of the missing features can be found in the above gist, most are on line 14.

Data types

Status Title Details
Support BIGSERIAL data type Example: ( id BIGSERIAL PRIMARY KEY, ....
Support Postgres OIDs Example: JOIN pg_catalog.pg_namespace cftn ON cft.relnamespace = cftn.oid.

Postgres catalog and schema support

Status Title Details
Support information_schema.schemata Example: SELECT 1 FROM information_schema.schemata WHERE schema_name = $1
Support information_schema.key_column_usage Example: JOIN information_schema.key_column_usage AS kcu
Support pg_catalog.pg_available_extensions Example: SELECT 1 FROM pg_catalog.pg_available_extensions WHERE name = $1
Support pg_catalog.pg_constraint Example: JOIN pg_catalog.pg_namespace cftn ON cft.relnamespace = cftn.oid WHERE r.contype = 'f'.
Support pg_catalog.pg_class Example: JOIN pg_catalog.pg_class ct ON r.conrelid = ct.oid
Support pg_catalog.pg_namespace Example: JOIN pg_catalog.pg_namespace ctn ON ct.relnamespace = ctn.oid
Support pg_catalog.pg_attribute Example: JOIN pg_catalog.pg_attribute ac ON q.column_id = ac.attnum AND q.table_id = ac.attrelid.
Support pg_catalog.pg_relation_is_updatable Example: pg_catalog.pg_relation_is_updatable("table".oid, true).
Support Postgres table schema Example: GROUP BY q.table_schema, q.table_name, q.constraint_name.
Support Postgres table constraints schema Example: SELECT tc.table_name, tc.constraint_schema AS table_schema, tc.constraint_name as constraint_name.
Support Postgres pg_depend Example: AND (d.refobjid = r.oid) AND (d.refobjsubid = a.attnum).
Support Postgres pg_type Example: SELECT pt.typname AS "name",.
Support Postgres pg_proc Example: FROM unnest( COALESCE(p.proallargtypes, (p.proargtypes) :: oid []) ).
Support Postgres pg_aggregate Example: FROM unnest( COALESCE(p.proallargtypes, (p.proargtypes) :: oid []) ).

Postgres command support

Status Title Details
Support stored procedures Example: EXECUTE PROCEDURE hdb_catalog.hdb_schema_update_event_notifier();.
Support pg_notify Example: PERFORM pg_notify('hasura_schema_update', json_build_object( ...
Support Postgres table functions Example: ... WITH ORDINALITY pat(oid, ordinality).
Support Postgres simple loops Example: LOOP EXECUTE format('ALTER VIEW %I.%I ALTER COLUMN.
Support SERIALIZABLE READ WRITE isolation level Example: BEGIN ISOLATION LEVEL SERIALIZABLE READ WRITE.
Support CREATE TABLE with primary keys Required on start up and probably elsewhere in the GUI.
Support CREATE TABLE with CHECK constraint Example: CONSTRAINT valid_status CHECK (status IN ('scheduled' ...
Support CREATE TABLE with ON UPDATE CASCADE Example: ... REFERENCES hdb_catalog.hdb_table(table_schema, table_name) ON UPDATE CASCADE );
Support CREATE UNIQUE INDEX Example: CREATE UNIQUE INDEX hdb_schema_update_event_one_row ON ...

Postgres function support

Status Title Details
Support pg_get_constraintdef A Postgres system table function. Example: pg_catalog.pg_get_constraintdef(r.oid, true) as check FROM pg_catalog.pg_constraint.
Support pg_get_functiondef A Postgres system table function. Example: pg_get_functiondef(p.oid) AS function_definition.
Support json_build_object A Postgres JSON function. Example: SELECT COALESCE(json_agg( json_build_object('schema', q."schema", 'name', q."name", 'type', q."type" ) ), '[]').
Support Postgres' now() Example: occurred_at timestamptz NOT NULL DEFAULT NOW(). This diverges from how we handle now() currently.
Support custom SQL functions Allows users to query "computed fields". Hasura details, PostgreSQL details Also, required on start up. Example: CREATE FUNCTION hdb_catalog.hdb_schema_update_event_notifier() ...

Miscellaneous Postgres support

Status Title Details
Support SET client_encoding Example: SET client_encoding = 'UTF8';
Support SET client_min_messages Example: SET client_min_messages TO WARNING;
Support Postgres extension pgcrypto Example: CREATE TABLE hdb_catalog.event_invocation_logs ( id TEXT DEFAULT gen_random_uuid() PRIMARY KEY,.
Support PL/pgSQL Example: $function$ LANGUAGE plpgsql; CREATE TRIGGER hdb_schema_update_event_notifier AFTER INSERT OR UPDATE ON ....
JLDLaughlin commented 4 years ago

I'm not sure what the current timeline is for 0.6, but this has too much work to be completed very soon. @cuongdo @awang

rrjanbiah commented 4 years ago

@JLDLaughlin

I'm not sure what the current timeline is for 0.6, but this has too much work to be completed very soon. @cuongdo @awang

I don't have necessary skillset to comment on the complexity. But, I believe it is worth the efforts... in business terms. Hasura is getting popular but on plain Postgres is not suitable for all cases (#3496)

awang commented 4 years ago

Thanks so much for this @JLDLaughlin!

I'm wondering if some of these features can be supported more quickly via a workaround -- for instance:

  1. Can SET client_encoding be a noop?
  2. support of BIGSERIAL: can we support a subset of this -- ie, would it be easier to create an autoincrementing integer type, but not validate that primary keys are unique?
  3. Stored procedures: rather than support arbitrary stored procedures, can we implement those procedures directly ourselves, and treat the CREATE FUNCTION as switch to enable our hardcoded function?

For other features (such as ON UPDATE CASCADE), I could also see it making sense to invest in a proper implementation, rather than a one-off workaround for Hasura.

To identify next steps, I suggest we take a pass at this list to identify level of effort to implement vs workaround (or perhaps, a level of "hackiness")?

benesch commented 3 years ago
  1. Can SET client_encoding be a noop?

Yes, this is less than an hour of work provided the client is trying to set the encoding to UTF-8, as they are here. That's easy enough!

  1. support of BIGSERIAL: can we support a subset of this -- ie, would it be easier to create an autoincrementing integer type, but not validate that primary keys are unique?
  2. Stored procedures: rather than support arbitrary stored procedures, can we implement those procedures directly ourselves, and treat the CREATE FUNCTION as switch to enable our hardcoded function?

For other features (such as ON UPDATE CASCADE), I could also see it making sense to invest in a proper implementation, rather than a one-off workaround for Hasura.

To identify next steps, I suggest we take a pass at this list to identify level of effort to implement vs workaround (or perhaps, a level of "hackiness")?

The more I look at Hasura, the more I'm worried about this approach. There are a few different parts to Hasura. There's the query layer, which maps a GraphQL query to a SQL query. That should be the most straightforward to support. Then there is... everything else, like mutations, subscriptions, and event triggers. Mutations we don't even support right now. Subscriptions and event triggers rely heavily on esoteric PostgreSQL features like triggers and PL/pgSQL, but triggers and PL/pgSQL are workarounds for the lack of an incremental materialized view engine. To get Hasura subscriptions/event triggers to work with Materialize, we almost certainly want to build a Materialize-specific Hasura adapter that can take advantage of Materialize sinks, rather than trying to support triggers and PL/pgSQL.

I think someone needs to sit down with Hasura, connect it to Materialize, and start poking at its code and understanding its architecture to truly scope this work—trying to scope these individual PostgreSQL features in a vacuum is unlikely to be productive. (Like, trying to support triggers and PL/pgSQL would be literal years of work.)

Anyway, I think the immediate next step is https://github.com/MaterializeInc/materialize/issues/3727. I don't think we should attempt to go any farther with Hasura support until we've made a bit more progress on #3727 and have the first few pg_catalog views in place. At that point we might be able to actually hook up a Hasura instance to Materialize, make some headway, and start parallelizing the work items.

rrjanbiah commented 3 years ago

@benesch FWIW, Hasura team is pretty much active in GitHub and Twitter. You can easily coordinate with them.

CC @0x777 @coco98

benesch commented 3 years ago

So I looked into this more today. As @JLDLaughlin documented above Hasura uses approximately every inch of PostgreSQL. I think the strategy of "pretend to be PostgreSQL" is a pretty doomed one for Hasura support.

But! Hasura is building out support for MySQL: https://github.com/hasura/graphql-engine/pull/5655. The insight here is that this entails all the same problems as building out support for Materialize. Hasura needs to figure out how to map MySQLisms into its PostgreSQL-centric world.

Their plan is to do a split architecture. When using Hasura with MySQL, Hasura will still need access to a PostgreSQL to store its own data. That vastly limits the scope of what needs to be translated from MySQL to PostgreSQL.

I assume a similar approach would work for Materialize. You'd have Hasura continue to store its own metadata in a PostgreSQL server somewhere, but learn to interop with Materialize as an external data source.

That said, this will still be a ton of work, and require intimate knowledge of Hasura's internals (and therefore Haskell). I don't think this will happen anytime soon unless we get some dedicated time from Hasura engineers.

krishmanoh2 commented 3 years ago

Remote schemas - was mentioned as a potential method at integrating mz with Hasura.

https://hasura.io/docs/1.0/graphql/core/remote-schemas/index.html

benesch commented 3 years ago

Remote schemas - was mentioned as a potential method at integrating mz with Hasura.

https://hasura.io/docs/1.0/graphql/core/remote-schemas/index.html

Not sure how much good that's going to do us, sadly.

Hasura has the ability to merge remote GraphQL schemas and provide a unified GraphQL API. Think of it like automated schema stitching. All you need to do is build your own GraphQL service and then provide the HTTP endpoint to Hasura. Your GraphQL service can be written in any language or framework.

(emphasis mine)

rrjanbiah commented 3 years ago

FWIW...

I humbly suggest related devs to directly speak to the Hasura team. AFAIK, they're pretty much active and so it may not be an issue.

Also, I think, Yugabyte team made some progress in the similar approach.

benesch commented 3 years ago

Yugabyte is presently much more PostgreSQL compatible than we are, so they interoperate with Hasura by emulating all of PostgreSQL. Like I mentioned above I'm afraid that strategy is not going to work for us. At least not for a while.

coco98 commented 3 years ago

Hey folks! Thanks for the active conversation here.

With our new 1.4 release we've generalized our query compilation pipeline so adding new databases should be very easy and not coupled to Postgres's exact syntax.

We'll drop a note here once the release is out in a week or so. We're planning to write up a developer/contributor guide as well, should boil down to implementing a few interfaces essentially with easy to grok Haskell, so that extending for more databases can become more collaborative :)

@benesch we have a shared slack channel between hasura/materialize as well so we'll drop you an update there as well and start to spec this out better.

benesch commented 3 years ago

Thanks very much, @coco98! I'll keep an eye out for it.

We're planning to write up a developer/contributor guide as well, should boil down to implementing a few interfaces essentially with easy to grok Haskell, so that extending for more databases can become more collaborative :)

This sounds fantastic.

rrjanbiah commented 3 years ago

FWIW,

AFAIK, the user path will look like this:

 -------
| MySQL |
 -------
    |  Experiences crash or looks for other open source database
 ------------
| PostgreSQL |
 ------------
      | Looking for RAD API tool to speed up dev
 --------
| Hasura |
 --------
    | Perhaps get into edge cases or wants distributed system
 -------------        ------------
| Materialize |  or  | YugabyteDB |
 -------------        ------------
    | But, still user prefers Hasura for API
 ----------------------
| Materialize + Hasura |
 ----------------------

Meaning, the 'Materialize + Hasura' use case will be in the bottom of the funnel. So, IMHO, it makes sense to support only minimal features. Also, it may help to reach the Hasura ecosystem easily (in business perspective).

ajbouh commented 3 years ago

I'm not sure it's correct to say someone looking to use Materialized with Hasura is only getting into edge cases or distributed systems. Nor that someone might be starting with a choice of database and then using Hasura because of its compatibility with that database.

I personally started with Hasura because I wanted to have an edge-accessible backend without writing a lot of manual boilerplate. My choice was either AWS AppSync or Hasura. I'm only using postgres because Hasura uses it.

As for Materialized, I want to have live updated values visible to my users in-browser. Hasura's built-in subscription implementation is fine for simple queries but not complex views.

My driving interest in Materialized is to have efficient subscription support for complex (and incrementally computable) views.

benesch commented 3 years ago

Very much agreed, @ajbouh! Thanks for writing up your thoughts.

rrjanbiah commented 3 years ago

@ajbouh Thanks for your comments. I've replied you in the discussion https://github.com/MaterializeInc/materialize/discussions/5764 to avoid polluting this ticket.

jbadeau commented 2 years ago

Has anyone looked at how https://www.prisma.io/apollo works? They seem to support orm for multiple dbs? I will try it out next week as I’m also looking for a GraphQL over materialize

morsapaes commented 7 months ago

As an update: there is no clear timeline for Hasura (and Hasura Cloud) support, but we're in touch with their team. They're currently working on stabilizing Hasura v3, which will make connector development much easier than in previous versions.