Open JLDLaughlin opened 4 years ago
I'm not sure what the current timeline is for 0.6
, but this has too much work to be completed very soon. @cuongdo @awang
@JLDLaughlin
I'm not sure what the current timeline is for
0.6
, but this has too much work to be completed very soon. @cuongdo @awang
I don't have necessary skillset to comment on the complexity. But, I believe it is worth the efforts... in business terms. Hasura is getting popular but on plain Postgres is not suitable for all cases (#3496)
Thanks so much for this @JLDLaughlin!
I'm wondering if some of these features can be supported more quickly via a workaround -- for instance:
SET client_encoding
be a noop? BIGSERIAL
: can we support a subset of this -- ie, would it be easier to create an autoincrementing integer type, but not validate that primary keys are unique?For other features (such as ON UPDATE CASCADE
), I could also see it making sense to invest in a proper implementation, rather than a one-off workaround for Hasura.
To identify next steps, I suggest we take a pass at this list to identify level of effort to implement vs workaround (or perhaps, a level of "hackiness")?
- Can
SET client_encoding
be a noop?
Yes, this is less than an hour of work provided the client is trying to set the encoding to UTF-8, as they are here. That's easy enough!
- support of
BIGSERIAL
: can we support a subset of this -- ie, would it be easier to create an autoincrementing integer type, but not validate that primary keys are unique?- Stored procedures: rather than support arbitrary stored procedures, can we implement those procedures directly ourselves, and treat the CREATE FUNCTION as switch to enable our hardcoded function?
For other features (such as
ON UPDATE CASCADE
), I could also see it making sense to invest in a proper implementation, rather than a one-off workaround for Hasura.To identify next steps, I suggest we take a pass at this list to identify level of effort to implement vs workaround (or perhaps, a level of "hackiness")?
The more I look at Hasura, the more I'm worried about this approach. There are a few different parts to Hasura. There's the query layer, which maps a GraphQL query to a SQL query. That should be the most straightforward to support. Then there is... everything else, like mutations, subscriptions, and event triggers. Mutations we don't even support right now. Subscriptions and event triggers rely heavily on esoteric PostgreSQL features like triggers and PL/pgSQL, but triggers and PL/pgSQL are workarounds for the lack of an incremental materialized view engine. To get Hasura subscriptions/event triggers to work with Materialize, we almost certainly want to build a Materialize-specific Hasura adapter that can take advantage of Materialize sinks, rather than trying to support triggers and PL/pgSQL.
I think someone needs to sit down with Hasura, connect it to Materialize, and start poking at its code and understanding its architecture to truly scope this work—trying to scope these individual PostgreSQL features in a vacuum is unlikely to be productive. (Like, trying to support triggers and PL/pgSQL would be literal years of work.)
Anyway, I think the immediate next step is https://github.com/MaterializeInc/materialize/issues/3727. I don't think we should attempt to go any farther with Hasura support until we've made a bit more progress on #3727 and have the first few pg_catalog
views in place. At that point we might be able to actually hook up a Hasura instance to Materialize, make some headway, and start parallelizing the work items.
@benesch FWIW, Hasura team is pretty much active in GitHub and Twitter. You can easily coordinate with them.
CC @0x777 @coco98
So I looked into this more today. As @JLDLaughlin documented above Hasura uses approximately every inch of PostgreSQL. I think the strategy of "pretend to be PostgreSQL" is a pretty doomed one for Hasura support.
But! Hasura is building out support for MySQL: https://github.com/hasura/graphql-engine/pull/5655. The insight here is that this entails all the same problems as building out support for Materialize. Hasura needs to figure out how to map MySQLisms into its PostgreSQL-centric world.
Their plan is to do a split architecture. When using Hasura with MySQL, Hasura will still need access to a PostgreSQL to store its own data. That vastly limits the scope of what needs to be translated from MySQL to PostgreSQL.
I assume a similar approach would work for Materialize. You'd have Hasura continue to store its own metadata in a PostgreSQL server somewhere, but learn to interop with Materialize as an external data source.
That said, this will still be a ton of work, and require intimate knowledge of Hasura's internals (and therefore Haskell). I don't think this will happen anytime soon unless we get some dedicated time from Hasura engineers.
Remote schemas - was mentioned as a potential method at integrating mz with Hasura.
https://hasura.io/docs/1.0/graphql/core/remote-schemas/index.html
Remote schemas - was mentioned as a potential method at integrating mz with Hasura.
https://hasura.io/docs/1.0/graphql/core/remote-schemas/index.html
Not sure how much good that's going to do us, sadly.
Hasura has the ability to merge remote GraphQL schemas and provide a unified GraphQL API. Think of it like automated schema stitching. All you need to do is build your own GraphQL service and then provide the HTTP endpoint to Hasura. Your GraphQL service can be written in any language or framework.
(emphasis mine)
FWIW...
I humbly suggest related devs to directly speak to the Hasura team. AFAIK, they're pretty much active and so it may not be an issue.
Also, I think, Yugabyte team made some progress in the similar approach.
Yugabyte is presently much more PostgreSQL compatible than we are, so they interoperate with Hasura by emulating all of PostgreSQL. Like I mentioned above I'm afraid that strategy is not going to work for us. At least not for a while.
Hey folks! Thanks for the active conversation here.
With our new 1.4 release we've generalized our query compilation pipeline so adding new databases should be very easy and not coupled to Postgres's exact syntax.
We'll drop a note here once the release is out in a week or so. We're planning to write up a developer/contributor guide as well, should boil down to implementing a few interfaces essentially with easy to grok Haskell, so that extending for more databases can become more collaborative :)
@benesch we have a shared slack channel between hasura/materialize as well so we'll drop you an update there as well and start to spec this out better.
Thanks very much, @coco98! I'll keep an eye out for it.
We're planning to write up a developer/contributor guide as well, should boil down to implementing a few interfaces essentially with easy to grok Haskell, so that extending for more databases can become more collaborative :)
This sounds fantastic.
FWIW,
AFAIK, the user path will look like this:
-------
| MySQL |
-------
| Experiences crash or looks for other open source database
------------
| PostgreSQL |
------------
| Looking for RAD API tool to speed up dev
--------
| Hasura |
--------
| Perhaps get into edge cases or wants distributed system
------------- ------------
| Materialize | or | YugabyteDB |
------------- ------------
| But, still user prefers Hasura for API
----------------------
| Materialize + Hasura |
----------------------
Meaning, the 'Materialize + Hasura' use case will be in the bottom of the funnel. So, IMHO, it makes sense to support only minimal features. Also, it may help to reach the Hasura ecosystem easily (in business perspective).
I'm not sure it's correct to say someone looking to use Materialized with Hasura is only getting into edge cases or distributed systems. Nor that someone might be starting with a choice of database and then using Hasura because of its compatibility with that database.
I personally started with Hasura because I wanted to have an edge-accessible backend without writing a lot of manual boilerplate. My choice was either AWS AppSync or Hasura. I'm only using postgres because Hasura uses it.
As for Materialized, I want to have live updated values visible to my users in-browser. Hasura's built-in subscription implementation is fine for simple queries but not complex views.
My driving interest in Materialized is to have efficient subscription support for complex (and incrementally computable) views.
Very much agreed, @ajbouh! Thanks for writing up your thoughts.
@ajbouh Thanks for your comments. I've replied you in the discussion https://github.com/MaterializeInc/materialize/discussions/5764 to avoid polluting this ticket.
Has anyone looked at how https://www.prisma.io/apollo works? They seem to support orm for multiple dbs? I will try it out next week as I’m also looking for a GraphQL over materialize
As an update: there is no clear timeline for Hasura (and Hasura Cloud) support, but we're in touch with their team. They're currently working on stabilizing Hasura v3, which will make connector development much easier than in previous versions.
Background
Hasura is a no-code tool that automatically generates secure GraphQL APIs for user data. To generate these APIs from a Postgres database (or Postgres-compliant database, like Materialize), Hasura requires some set of Postgres features and functions to be implemented.
This is intended to be a tracking issue for all things related to Hasura support, and will therefore change over time.
Blockers
To investigate the features and functions we are currently lacking, I did the following:
host.docker.internal
trick).The following items missing from Materialize are required just to start Hasura, I didn't dig into anything additional in the UI. An example of each of the missing features can be found in the above gist, most are on line 14.
Data types
BIGSERIAL
data type( id BIGSERIAL PRIMARY KEY, ...
.JOIN pg_catalog.pg_namespace cftn ON cft.relnamespace = cftn.oid
.Postgres catalog and schema support
information_schema.schemata
SELECT 1 FROM information_schema.schemata WHERE schema_name = $1
information_schema.key_column_usage
JOIN information_schema.key_column_usage AS kcu
pg_catalog.pg_available_extensions
SELECT 1 FROM pg_catalog.pg_available_extensions WHERE name = $1
pg_catalog.pg_constraint
JOIN pg_catalog.pg_namespace cftn ON cft.relnamespace = cftn.oid WHERE r.contype = 'f'
.pg_catalog.pg_class
JOIN pg_catalog.pg_class ct ON r.conrelid = ct.oid
pg_catalog.pg_namespace
JOIN pg_catalog.pg_namespace ctn ON ct.relnamespace = ctn.oid
pg_catalog.pg_attribute
JOIN pg_catalog.pg_attribute ac ON q.column_id = ac.attnum AND q.table_id = ac.attrelid
.pg_catalog.pg_relation_is_updatable
pg_catalog.pg_relation_is_updatable("table".oid, true)
.GROUP BY q.table_schema, q.table_name, q.constraint_name
.SELECT tc.table_name, tc.constraint_schema AS table_schema, tc.constraint_name as constraint_name
.AND (d.refobjid = r.oid) AND (d.refobjsubid = a.attnum)
.SELECT pt.typname AS "name",
.FROM unnest( COALESCE(p.proallargtypes, (p.proargtypes) :: oid []) )
.FROM unnest( COALESCE(p.proallargtypes, (p.proargtypes) :: oid []) )
.Postgres command support
EXECUTE PROCEDURE hdb_catalog.hdb_schema_update_event_notifier();
.pg_notify
PERFORM pg_notify('hasura_schema_update', json_build_object( ...
... WITH ORDINALITY pat(oid, ordinality)
.LOOP EXECUTE format('ALTER VIEW %I.%I ALTER COLUMN
.SERIALIZABLE READ WRITE
isolation levelBEGIN ISOLATION LEVEL SERIALIZABLE READ WRITE
.CREATE TABLE
with primary keysCREATE TABLE
withCHECK
constraintCONSTRAINT valid_status CHECK (status IN ('scheduled' ...
CREATE TABLE
withON UPDATE CASCADE
... REFERENCES hdb_catalog.hdb_table(table_schema, table_name) ON UPDATE CASCADE );
CREATE UNIQUE INDEX
CREATE UNIQUE INDEX hdb_schema_update_event_one_row ON ...
Postgres function support
pg_get_constraintdef
pg_catalog.pg_get_constraintdef(r.oid, true) as check FROM pg_catalog.pg_constraint
.pg_get_functiondef
pg_get_functiondef(p.oid) AS function_definition
.json_build_object
SELECT COALESCE(json_agg( json_build_object('schema', q."schema", 'name', q."name", 'type', q."type" ) ), '[]')
.now()
occurred_at timestamptz NOT NULL DEFAULT NOW()
. This diverges from how we handlenow()
currently.CREATE FUNCTION hdb_catalog.hdb_schema_update_event_notifier() ...
Miscellaneous Postgres support
SET client_encoding
SET client_encoding = 'UTF8';
SET client_min_messages
SET client_min_messages TO WARNING;
pgcrypto
CREATE TABLE hdb_catalog.event_invocation_logs ( id TEXT DEFAULT gen_random_uuid() PRIMARY KEY,
.PL/pgSQL
$function$ LANGUAGE plpgsql; CREATE TRIGGER hdb_schema_update_event_notifier AFTER INSERT OR UPDATE ON ....