Cache the schema and relationships discovered from the database

krish7919 commented 2 years ago

Summary:

I stumbled on this project via https://www.percona.com/resources/videos/automagical-graphql-sql-compiler and I have a use-case which would suit perfectly with this project. However, this might need the addition of a feature.

The current problem statement is to accept a graphql request via AWS Lambda (or any serverless framework), spin up the lambda, convert the graphql to plain sql, run the query, fetch/insert the result, return the reponse. I use Postgresql.

I am currently researching and have almost finalised doing this via the Postgraphile library in Javascript, making use of its --write-cache CLI option to write the schema and relationship data discovered in a storage medium (or package it in the Lambda), and then use the --read-schema CLI option to fetch this cached data, validate the graphql query against it and return the result to caller. (I am not using CLIs anymore, but the functions that are being called by the CLI, but that is just an implementation detail for now).

What would you like to be added: Option/method to save the discovered data to a file or some cache.

Why is this needed:

I would like to use graphjin to do something as explained above, however, I am unable to find any documentation to save the discovered data. In serverless framework, one does not need to discover this data for every API call that is made, and hence, this will be a very useful feature, and will allow for efficiency gain for use-cases like mine, performance, and better adoption of the library in general.

krish7919 commented 2 years ago

I see that @dosco mentions in his talk at https://www.youtube.com/watch?v=gzAiAbsCMVA&t=1396s that this runs on a serverless platform. Is there a doc on how this is made performant and set up? Is all we need to do is list out all the allowed queries and that's it? How does the list help on a serverless platform as the stored procedures need to be created accordingly?

dosco commented 2 years ago

GraphJin is used in severless environments like Google Cloud Run, Cloud Functions and App Engine all of which are scale to zero services. GraphJin should work fine in any functions as a service environment since database discovery is a single very fast query and GraphJin is designed to be up and running in under 20ms which ads almost zero overhead to any function call.

krish7919 commented 2 years ago

Is there anyway to avoid the initial single fast query. This might be an issue for bigger schemas; and might avoid the one query to the db for making APIs respond faster.

gedw99 commented 2 years ago

GraphJin is used in severless environments like Google Cloud Run, Cloud Functions and App Engine all of which are scale to zero services. GraphJin should work fine in any functions as a service environment since database discovery is a single very fast query and GraphJin is designed to be up and running in under 20ms which ads almost zero overhead to any function call.

@dosco is there only docs on this setup ? I really want to try this out

dosco commented 2 years ago

I need to check the wiki but every new GraphJin project contains a cloudbuild.yml to build and deploy GraphJin on CloudRun. Its just a matter or deploying the Docker image with the right environment vars set. GJ_DATABASE_HOST, GJ_DATABASE_USER and GJ_DATABASE_PASSWORD

dosco commented 2 years ago

@krish7919 Not at the moment all of GraphJin works off this database discovery. Unless your using cloud function that are rarely called (run from cold start every time) it won't be much of an issue.

gedw99 commented 2 years ago

thanks for the tips !! This is really cool

krish7919 commented 1 year ago

@dosco

I tried to write a small PoC using graphjin library, and pointed it to my currrent tables. It fails with the error panic: postgres: foreign key column not found: X.Y when calling NewDBSchema in the _initSchema() method. So not quite a drop in for the the current postgraphile solution we use.

dosco commented 1 year ago

I tried to write a small PoC using graphjin library, and pointed it to my currrent tables. It fails with the error panic: postgres: foreign key column not found: X.Y when calling NewDBSchema in the _initSchema() method. So not quite a drop in the the current postgraphile solution we use.

Definitly want to fix this issue for you. Can you give me the following details.

Are you using multiple schemas?
Is this a foreign key relationship between tables in two different schemas?
Is there anything special about this foreign key column (some special postgres feature)?

dosco commented 1 year ago

Added a fix for this 01d54ca2edb1f0051fd7dd8e324239e773e75054 @krish7919 can you please test and confirm.

krish7919 commented 1 year ago

To answer your question, @dosco:

Are you using multiple schemas? Is this a foreign key relationship between tables in two different schemas?

No, just a single one. As a matter of fact, only the public schema.

Is there anything special about this foreign key column (some special postgres feature)?

Nothing special here.

Actually, we can get rid of foreign key checks in Graphjin if there was an option to, and move ahead, but it doesn't seem to be the case today (or I couldn't find it).

dosco commented 1 year ago

@krish7919 can you confirm if the above fix worked for you?

krish7919 commented 1 year ago

@dosco I think you are confusing one issue with another. The error I have mentioned above still exists, its a Graphjin library logic error. The fix you have done performs an early return; I had already made that fix locally and the error message mentioned above is what happens with my database schema. And it works perfectly fine with postgraphile.

The early return on error is fixed and it is fine to close that issue; this issue should not be as Graphjin fails in scenarios where postgraphile doesn't.

dosco commented 1 year ago

The commit has other changes as well in addition to including that err check I had previously missed. The schema discovery now has ordering. The previous random processing order of columns might have also been a reason for this issue. I have also updated the test which helped me recreate your error and the fix makes the test pass. If you could try the latest code and see if the issue still exists for you that would be helpful. GraphJin has nothing to do with postgrphile these are entirely different codebases.

krish7919 commented 1 year ago

My go.mod looks like so now:

go 1.19

require (
    github.com/dosco/graphjin v0.20.33-0.20221007075649-01d54ca2edb1
    github.com/jmoiron/sqlx v1.3.5
    github.com/lib/pq v1.10.7
)

And the same error persists, although the ordering persists.

I do not know much about our db schemas at present. However, considering the schema is consistent and postgraphile doesn't complain, I find it odd that Graphjin complains about the FK error mentioned above.

dosco commented 1 year ago

Just to be double sure you are using the right commit id I pushed a new version update could you retry but first go get the latest

go get github.com/dosco/graphjin@latest

krish7919 commented 1 year ago


$ head -n10 go.mod
module /path/...

go 1.19

require (
    github.com/dosco/graphjin v0.20.33
    github.com/jmoiron/sqlx v1.3.5
    github.com/lib/pq v1.10.7
)

$ go run main.go
panic: postgres: foreign key column not found: X.Y

goroutine 1 [running]:
main.main()
    /path/x/y/z/main.go:93 +0x235
exit status 2

krish7919 commented 1 year ago

If it matters, we are at Postgres v10

krish7919 commented 1 year ago

I dumped the columns in c := t.Columns[i] and foreign table (?) in ft := s.tables[v.nodeID] and I can confirm that the columns c is present in ft but the getColumn does not return it.

EDIT 1:

This happens because DBTable.colMap does not have the column that we are looking for.

EDIT 2:

And this is because the columns have different casing. For example, I have the column in DBTable.colMap that is traitid, but the actual column name is traitId. Where does this case diff come from?

EDIT 3:

It seems to come from this line in code at https://github.com/dosco/graphjin/blob/master/core/internal/sdata/tables.go#L169:

ti.colMap[strings.ToLower(c.Name)] = i

EDIT 4:

I removed the ToLower condition and it moves ahead and NewGraphJin() completes fine. Next issue is when I call GraphQL(), it is looking for a table for the query I run, but not at the Postgres function created for the corresponding Graphql query.

dosco commented 1 year ago

Thanks for this I need to add a test with mixed-case names. I'll have a fix out.

krish7919 commented 1 year ago

As a summary, I am trying to comb through the library now to resolve why my query does not find the correct DB function to call.. :)

I am trying to understand a bit, but it spews the error table not found: from Find() instead of trying to find the function name.

krish7919 commented 1 year ago

It doesn't seem that SQL Functions are handled in graphjin as is; I see some commented and an early exit with error when a table name corresponding to the query is not found; rather than continuing and checking for function names (or vice versa, check for functions first and then query).

dosco commented 1 year ago

but not at the Postgres function created for the corresponding Graphql query.

Can you explain this further. We don't currently support functions in GraphQL selectors only tables. Functions can only be used for aggregation. For example to count users by the id field. Internally this converts to count(id). I'm open to adding support for your usecase.

users {
  count_id
}

krish7919 commented 1 year ago

Yes, this should have been documented, or if it has been, I missed it.

We use functions exclusively so that we can refactor them in case the table schema cannot be immediately updated/changed.

Thanks for considering adding support for this.

dosco commented 1 year ago

Can you share an example of a function that returns a table it's not a postgres feature that I have used personally. I'm guessing it wont be too hard to add.

krish7919 commented 1 year ago

Every function can return a table, ref https://www.postgresql.org/docs/current/sql-createfunction.html

EDIT:

Some of our functions also return a boolean, or a set or other return types, as mentioned in https://www.postgresql.org/docs/current/xfunc-sql.html

dosco commented 1 year ago

@krish7919 cool thanks i'll look into this today. I hope the latest update fixes the column not found issue for you. There were several places where things were lowercased remenants from when we were attempting to make things case insenitive. Its all case-sensitive now I guess your mixed case column names helped fix it.

krish7919 commented 1 year ago

The column issue is fixed. Thanks for looking into the implementing support for functions. 👍

I still think the naming convention is a tad off compared to postgraphile library. For example, we have something like x_v_1_2_y_z which gets converted to x_v_12_y_x in grpahjin (based on the EnableCamelCase option), but is converted to something differnet in postgraphile. This should be a minor though and I will list all the cases here once I can get something to run with the functions.

dosco commented 1 year ago

Can you share the exact camel case for x_v_1_2_y_z in postgraphile. I'm happy to update our function to match the behaviour.

krish7919 commented 1 year ago

As 1 example, the graphql quey getVersionV100StatsForAllDatapoints is mapped to the SQL function get_version_v1_0_0_stats_for_all_datapoints in Postgres. However, there are many others and I could not enumerate all of them right now. However, most are of the form mentioned above.

I do know that there are some corner cases (for example something like iphone_6e would change to iphone6E, instead of iphone6e), but that's a remnant of moving from postgraphql to postgraphile.

dosco commented 1 year ago

The only difference I'm seeing here is that numbers (digits) are being split into underscore seperated values while I think we kep numbers together. We could provider a config to enable splitting numbers. iphone6E does not look like a corner case to me it follows the same pattern that V1 does in the previous getVersionV10.. example you provided.

dosco commented 1 year ago

Lots of major fixes and updates since this thread was last commented on most of these bugs have been fixed also we now generate a schema file schema.graphql automatically from the db in dev mode and can use it in production mode to prevent any discovery queries in prod and make GraphJin a much nicer option for serverless environments like lambda functions

dosco / graphjin

Cache the schema and relationships discovered from the database #312