edgedb / edgedb-rust

The official Rust binding for EdgeDB
https://edgedb.com
Apache License 2.0
215 stars 26 forks source link

How can I enable DDL queries in the EdgeDB Rust client? #183

Closed brassel closed 1 year ago

brassel commented 1 year ago

Hello EdgeDB Team,

first of all many thanks for this great project!

With regard to using DDL it seems to me that not allowing DDL queries is hardcoded in the Rust client here:

https://github.com/edgedb/edgedb-rust/blob/48624f2a7fb0b672393998a4a3af9b17ee48c742/edgedb-tokio/src/client.rs#L85

How but how can I, e.g., create a type with the Rust client?

elprans commented 1 year ago

@tailhook, I think we can safely allow DDL here.

brassel commented 1 year ago

Maybe this could become my first tiny contribution to the project! I have changed the four lines for query and query_json, single result and general result to

allow_capabilities: Capabilities::ALL,

My commands to, e.g., to create a type run without issue now. However, before creating a merge request I wanted to run the tests. And

cargo test --workspace --exclude edgedb-tokio

gives me errors which do not seem to be related to my changes like

error[E0428]: the name `spi_async_socket_impl_delegate` is defined multiple times
tailhook commented 1 year ago

After #184 landed test will work (will be easier to run).

But we can't enable all capabilities in the connection pool queries, because transaction statements and session config settings are incompatible with how connection pool works.

DDLs can be enabled, though.

tailhook commented 1 year ago

I'm also curious what's your use case for running DDL from Rust bindings? We usually discourage running DDLs directly and use them only as the internal implementation of migrations.

brassel commented 1 year ago

But we can't enable all capabilities in the connection pool queries, because transaction statements and session config settings are incompatible with how connection pool works. DDLs can be enabled, though.

Ah I see, so it would be MODIFIED + DDL then, right?

I'm also curious what's your use case for running DDL from Rust bindings?

We do have a big system which has been running for years now and which allows the dynamic creation of data models and user interfaces making use of those dynamically defined data. However, I was never really satisfied the way it was implemented and now I am experimenting to use EdgeDB instead of our old implementation. I can tell you more about it if you want to know.

tailhook commented 1 year ago

But we can't enable all capabilities in the connection pool queries, because transaction statements and session config settings are incompatible with how connection pool works. DDLs can be enabled, though.

Ah I see, so it would be MODIFIED + DDL then, right?

Yes, I think so.

I'm also curious what's your use case for running DDL from Rust bindings?

We do have a big system which has been running for years now and which allows the dynamic creation of data models and user interfaces making use of those dynamically defined data. However, I was never really satisfied the way it was implemented and now I am experimenting to use EdgeDB instead of our old implementation. I can tell you more about it if you want to know.

That's interesting. This is not our primary use case for now, so I'm not sure how well EdgeDB works for that. I do not have any specific questions for now, but feel free to share your experiences. I think we can have many improvements of this aspect of the database.

elprans commented 1 year ago

We usually discourage running DDLs directly and use them only as the internal implementation of migrations.

This is true, but the binding shouldn't actually prevent you from running DDL (other language bindings don't).

brassel commented 1 year ago

Ah I see, so it would be MODIFIED + DDL then, right?

Yes, I think so.

I have now changed this to

allow_capabilities: Capabilities::MODIFICATIONS | Capabilities::DDL,

and tested it at least with the few examples I have written so far. If you are ok with this, I would create a pull request?

tailhook commented 1 year ago

I would create a pull request?

Sure.

brassel commented 1 year ago

feel free to share your experiences

I guess that this issue is not the best place to share but anyway - here is the experience so far.

The basic deep dive into what Edgedb has to offer was very satisfying. I am really impressed what you guys are setting up here. Edgedb has everything I would need to base the new version of our platform on it and much, much more.

The first serious work was a bit of a downer, though. I aimed to import the state of our data from about a year ago. The main issue with this import is the performance of the schema creation. As I described in this issue, I would use DDL to dynamically create the schema. It took me some iteration to get everything right but yesterday the schema import was finalised (I think). I started the import about last midnight and it is still running more than 11 hours(!) later on my Apple M1 Max with 64 GB which is only a few months old. It takes a full 10 seconds in average to add a link or property sometimes more. Interpolating the time it will be done within the hour. All in all I think, my import is not that big. There are 890 types with about 3500 links/properties. The current version of our data will be much bigger, I guess.

I thought that using DDL might be the problem so I tried to use migrations. But with a schema description of all the 890 types but only a fraction of the links/properties and their settings it took 1 hour and 21 minutes for edgedb migration create. This is a bit alarming since the algorithmic complexity of this on an empty database should be not much more than that of a topological sorting or is it? In addition, the result was not usable because of a bug I reported in https://github.com/edgedb/edgedb/issues/4673 but which is already known and being worked on. Thump up for that and hopefully you can fix it soon!

This is the experience so far and the next phase of my test project will be importing the actual data. I am sure that this will be much more satisfying and I will keep you posted.

tailhook commented 1 year ago

It takes a full 10 seconds in average to add a link or property sometimes more.

Yes, mostly we did no optimizations on migrations. As 10 seconds is usually fine if you do one migration per your software release.

Have you tried to batching DDLs either inside a transactions, or using CREATE MIGRATION ? Batching might help as there are some per-statement overhead, although I'm not sure to what extent.

brassel commented 1 year ago

Yes, mostly we did no optimizations on migrations. As 10 seconds is usually fine if you do one migration per your software release.

So, it was 10 seconds for a single property. The complete migration took 12h51m.

Have you tried to batching DDLs either inside a transactions, or using CREATE MIGRATION ?

That is a good idea even if only to compare the run times. I will try that, thank you!

After the migration finally finished, I wanted to create a backup, naturally. However, I got

edgedb error: InternalServerError: error receiving dump header: maximum recursion depth exceeded while calling a Python object
  Hint: This is most likely a bug in EdgeDB. Please consider opening an issue ticket at https://github.com/edgedb/edgedb/issues/new?template=bug_report.md

Do you think it is has merit for you if I would report a bug for this?

brassel commented 1 year ago

Ok, adding the data does not work at all.

query let to error: insert `varycon.Henkel.Adhesive Experience`::`ARMasterConfig` { id := <uuid>"dc29dd8b-6a44-47de-9048-a2d58629a193" }
Error: InternalServerError: could not serialize result in worker subprocess
  Hint: This is most likely a bug in EdgeDB. Please consider opening an issue ticket at https://github.com/edgedb/edgedb/issues/new?template=bug_report.md

Do you want a bug ticket?

brassel commented 1 year ago

The last bug has been reported under https://github.com/edgedb/edgedb/issues/4677.

brassel commented 1 year ago

Have you tried to batching DDLs either inside a transactions, or using CREATE MIGRATION ? Batching might help as there are some per-statement overhead, although I'm not sure to what extent.

After the model was imported once successfully, I used describe schema and then I tried to import it again by migration in a completely new project with its own new instance.

Unfortunately, I am not at liberty to provide the data model as it is because of revealing customer related data. However, if you think this would be really useful, I could create a version with randomised names.

tailhook commented 1 year ago

Have you tried to batching DDLs either inside a transactions, or using CREATE MIGRATION ? Batching might help as there are some per-statement overhead, although I'm not sure to what extent.

After the model was imported once successfully, I used describe schema and then I tried to import it again by migration in a completely new project with its own new instance.

Just curious what is the size of the schema file in bytes?

This is a good test, but this is not what I mean. I meant rather that using running individual:

create type X { create property y -> z; };
create type Y { create property y -> z; };

Run that as a single migration:

create migration {
  create type X { create property y -> z; };
  create type Y { create property y -> z; };
};

Grouping those statements into ten or few tens could make overall time shorter.

But if you've already made the import it might not be needed anymore.

Since you're exploring bounds of our system, does dump/restore works fine on such database?

brassel commented 1 year ago

Grouping those statements into ten or few tens could make overall time shorter.

Ah yes, I see what you mean now. Thanks!

Just curious what is the size of the schema file in bytes?

it is at 679KB

But if you've already made the import it might not be needed anymore.

Yes, your idea is very welcome. I am only making a test run right now with a smaller state from a year back. I think that our actual current state will be 3-5 times bigger. And maybe grouping it the way you suppose will make the migration feasible after all.

brassel commented 1 year ago

Since you're exploring bounds of our system, does dump/restore works fine on such database?

No, it does not unfortunately. As I have reported above there is already a problem creating a dump:

After the migration finally finished, I wanted to create a backup, naturally. However, I got

edgedb error: InternalServerError: error receiving dump header: maximum recursion depth exceeded while calling a Python object
Hint: This is most likely a bug in EdgeDB. Please consider opening an issue ticket at https://github.com/edgedb/edgedb/issues/new?template=bug_report.md

Do you think it is has merit for you if I would report a bug for this?

tailhook commented 1 year ago

Since you're exploring bounds of our system, does dump/restore works fine on such database?

No, it does not unfortunately. As I have reported above there is already a problem creating a dump:

Oh, sorry. Yes please report an issue to https://github.com/edgedb/edgedb repository.

tailhook commented 1 year ago

I'm closing this as there is nothing left to do on Rust side. Feel free to open more issues on EdgeDB problems.