Closed nsidnev closed 4 years ago
Great questions all around, and thank you for being interested in Edgemorph! Before I dive into answering them, I would like to tell you a little bit about how I arrived at Edgemorph's architectural design and why it makes sense to me.
Originally, I wanted to make a simple query builder in EdgeDB: the kind where I write some code in either Python or Rust, and the library spits out the appropriate query statements to run on the database. I cooked up a prototype using a Flask server via Jinja templates, then parsed out the weird UTF-8 characters using Regex. And it worked!
At that point I thought I was done, and I would publish it as a library package called deltalib
. But then something happened. EdgeDB released new language features and I liked what they accomplished, so I took a backup of my database and upgraded to the latest version, then tried to commit all of my modules to the new database -- naively expecting everything to just work with some minor tweaks.
That didn't happen. Instead, I lost all of my data because I was too stubborn to drop the version, and I wasn't smart enough to figure out how to serialize all of the data in the dump format, much less put it into something useful like JSON from that dump file.
It wasn't until 4-5 weeks ago when I had an epiphany that changed the way I look at this data hierarchy problem. And that realization was:
EdgeQL is going to continue to evolve, so why try to fight it? Even if I'm extraordinarily successful and a future iteration of deltalib takes off, there's a chance that Yury and Elvis will have to work even harder just to maintain ORM compability because they don't want to break the community's code; and that jeopardizes the quality of EdgeQL -- my favorite part of the EdgeDB product.
I realized that there is a fundamental incompability between an object-relational mapper and a powerful query language like EdgeQL. Not only do they both try to enforce structure over the same source using different patterns, but the ORM is always limited to producing a monotonically decreasing subset of the actual query language! As a result, doing things the old fashioned way with a standard ORM is only going to limit EdgeDB. Going forward, it would be necessary to rethink my strategy.
If an ORM is fundamentally wrong for EdgeDB, then what's right?
Well, EdgeQL feels right. How do I do more EdgeQL, but still keep the good things that an ORM does?
This puts us at a decent place to begin discussing your questions.
Questions:
Why do we need separate types such as NamedType
,HasAddressType
or UserType
(and BTW .edm_user
module)?
Great question!
.edm_<module name>
is the shared object target that CPython can use to bind Rust code. If you're curious about how this works, try poking around rust-to-python for a more interactive explanation. Using Rust builder macros, Edgemorph is generating custom library code based upon the EdgeQL structures we've parsed fromuser.edgeql
and templated them into statically-typed Ruststruct
's. Each of the custom EdgeQL types (Named
,HasAddress
, andUser
) will derive implementations in Rust-land for calling the correctly-named database queries. Each of these prepared statements are dynamically formed when runningedm make install
.edm
is the binary executable written in Rust that is responsible for reading EdgeQL module files, commiting their SDL to the database, and as part of that process -- injectingedgemorph
methods into the module for object insertion, filter via select, object updates, etc.We need separate names such as
NamedType
,HasAddressType
, andUserType
for reaching these Rust bindings. Additionally, I think it's a good idea to not name themName
,HasAddress
, andUser
because it disambiguates them from the names called in client code. This may come as a surprise butNamedType
,HasAddressType
, andUserType
are not actually types at all! Each of these are names for communicating with a type implementation of the things we wantNamed
,HasAddress
, andUser
to be able to do in a client's Python or Rust code. I went withType
as a suffix instead ofImpl
because it's more readable for me.
Why use decorators instead of extending classes like in other ORMs? This will add autocomplete to IDEs for the parent fields and it looks more pythonic(IMO). I understand that in Rust, you can simply expand all necessary fields through a macro, but for Python this method seems to me not suitable.
Good point. I like the design of Python 3.8's dataclasses. They look more like EdgeQL types than Python classes, which was my goal. You're completely right, though. That implementation is more Pythonic, and probably makes more sense to implement.
Why not move all the metadata such as index and annotation definitions into something like the Meta
class, like Django or Tortoise-ORM does. Or use descriptors for some field definitions (as done in Pydantic and SQLAlchemy) and do something like this:
@edgetype(extending=(Named, HasAddress), edb=UserType)
class User:
...
class Meta:
indieces = ("name",)
or
@edgetype(extending=(Named, HasAddress), edb=UserType)
class User:
...
name: Property[str] = Property(index=True)
Another good point! I struggled on a design specification for index and annotation definitions. While I'm not fond of nested Meta classes, the default property example looks quite elegant.
What about link properties definition? I had a couple of problems with them in my internal tool.
Agreed; these require some serious black magic since creating a
TypeVar
doesn't do representational justice when printingClassType.__annotations__
. I have a temporary shim in place that aliaseslink
with an optional union, but this cannot stay.
Will this project also be a query builder for EdgeDB? It would be great if so, but then there are also a couple more questions about types definition and general usage.
Does my explanation for your first question cover this well enough?
If this project is going to be a query builder, then it is not obvious how to use types as type hints for the values returned by the driver (in fact, it is not obvious to me even now). For example:
Welp. The short answer is that
edgedb.<prefix>Connection
's just won't work for us. It means that we will have to closely study the edgedb-protocol Rust crate, and integrate it with the process I described in the first answer.
Do you need any help with it? I'll be glad to help :)
Of course! To anyone who wants to join the adventure, please send your Github username to [mailto:david@dgolembiowski.com] and I'll add you to the project.
Thanks for the explanation! Most of my questions have been answered, but I still don't fully understand some things:
Welp. The short answer is that edgedb.
Connection's just won't work for us. It means that we will have to closely study the edgedb-protocol Rust crate, and integrate it with the process I described in the first answer.
Um... does this mean that edgemorph
will use its own connections under the hood? And reimplement them for Python and other supported languages in the future?
You're welcome!
- Could you show an example of using this library (framework?) with Python? How you want it to be used when the project is ready.
Sure thing! It'll generally go like this:
# Python usage
user@computer $ python -m venv app && cd app
user@computer $ edm init .
user@computer $ source bin/activate
user@computer $ pip install wheel edgemorph edm && pip freeze > requirements.txt
user@computer $ cd edb_modules
user@computer $ # Use an editor to add SDL to `./mod_app.edgeql`
user@computer $ # Verify the output path for `edgemorph_output.python` in `../edgemorph.toml`
See the edm specs and control flow graph
and feel free to point out any weaknesses you can spot. Submitting a PR for the weaknesses awards you extra-credit.
user@computer $ # Run `edm make` to see if all the EdgeQL passes a grammar check
user@computer $ edm make mod_app.edgeql
At this point, edm
calls in the compiler submodule from edgemorph
and begins generating the AST tree from the supplied EdgeQL source file(s). If the grammar is correct, edm
will serialize the EdgeQL module and write an intermediate file in the modules directory with instructions for edm make install
to deserialize.
user@computer $ # If everything's correct in the client's module file, then update `edgemorph.toml` with the appropriate DSN
user@computer $ cd .. && edm make install
At this point edm make install
has no knowledge of whether edm make
just ran, so it has to check the modules folder indicated in edgemorph.toml
for hints about pre-serialized EdgeQL targets. If the supplied (or implied) EdgeQL target does not have a serialized file in the edb_modules
directory, it will automatically run edm make <that module file>
. Otherwise, it will proceed and pull in the edgedb-protocol
crate to connect with the server indicated in the toml file. edm
will use edgedb-protocol::client_message::ClientMessage
to start a client handshake, authenticate, encode the following: start a transaction, create a migration, commit the migration, end the transaction. Then send the encoded message to the server and wait for the response. Once the response is received, begin printing everything important to STDOUT.
Next, edm make install
will deserialize the module and pull in the edgemorph
library builder, so that it can eject a client library target: edm_<edgeql mod>.so
, and also eject <edgeql mod>.py
.
Remark:
<edgeql mod>.py
can be modified directly by the user because it technically will be a fully-functioning Python file, but it's really just there for exposing callable names, class names, etc. fromedm_<edgeql mod>.so
to the neighboring Python files.
Welp. The short answer is that edgedb.Connection's just won't work for us. It means that we will have to closely study the edgedb-protocol Rust crate, and integrate it with the process I described in the first answer.
Um... does this mean that
edgemorph
will use its own connections under the hood? And reimplement them for Python and other supported languages in the future?
Yes, at least for the schema migration part. Until We have the technology. It's time to learn the async-std library and write database connection pools (or just borrow @tailhook's work ) .
But in the future, I'm hopeful it will be possible to expose SQLAlchemy-like sessions using PyO3's edgedb-rust
supports tokio, the semi-equivalent of MagicStack's uvloop for Python, edgemorph will have to only support Blocking IO connections under the hood. This includes all of the ORM-like bindings, too. :broken_heart:python! { ... }
macro, which can acquire a Python GIL and leverage inline edgedb-python
connections.
Yes, at least for the schema migration part. Until
edgedb-rust
supports tokio, the semi-equivalent of MagicStack's uvloop for Python, edgemorph will have to only support Blocking IO connections under the hood. This includes all of the ORM-like bindings, too. broken_heart
Why do you need tokio specificaly? Edgedb-rust/edgdb-client works on top of async_std, which is an async framework quite similar to tokio but having more stdlib-like APIs.
Yes, at least for the schema migration part. Until
edgedb-rust
supports tokio, the semi-equivalent of MagicStack's uvloop for Python, edgemorph will have to only support Blocking IO connections under the hood. This includes all of the ORM-like bindings, too. broken_heartWhy do you need tokio specificaly? Edgedb-rust/edgdb-client works on top of async_std, which is an async framework quite similar to tokio but having more stdlib-like APIs.
@tailhook, Yeah... you've got me there. After looking into the latest gadgets available in async-std, I've realized that I'm completely wrong.
@nsidnev, @tailhook, @1st1, @CodesInChaos could I have feedback on the designs for the client Rust API? I've spent considerable time designing a compact and thoughtful representation for querying and mutating data. For clients using this framework, preparing code follows this general process:
"What type am I operating upon?"
"Which of the type's fields are relevant to this database call?"
"What operation(s) am I performing?"
"Is the operation blocking or non-blocking?"
For example, a demo using .update()
:
"What type am I operating upon?"
User
"Which of the type's fields are relevant to this database call?"
name
, where name = "Alice"
but it needs to be changed to "Alice in Wonderland"
"What operation(s) am I performing?"
UPDATE
SET
"Is the operation blocking or non-blocking?"
So, the optimal client code resembles:
use crate::edm_app::*;
let mut user = User::edb()
.name("Alice")
.update()
.set(vec![("name", "Alice in Wonderland")]
.no_block().await?;
From my design research, I think a .filter()
method is mostly unnecessary since FILTER
operations can be inferred from the values passed into the edb
builder fields, but I'm not ready to close off the idea of using filter()
since it may be necessary in some corner cases.
In the Rust wiki, I'm building up a document with a large collection of examples to promote API-first design. As always, your questions are welcome.
Oh, sorry I've missed that. My quick input (athough I may be wrong on some things):
no_block().await?
. You can have Future
impl for all query builder structures, so .await
is enough. (Unless you want to specify connection where to fetch from). See surf
crate for an example of doing that.set
with a vector is unlikely to work, because of property types. .update()
should probably return some UpdateUser
object that has set_name
and other set_*
methods.User::update(User::filter().name("Alice")).set_name("Alice in Wonderland")
. This gives more stress on operation being done (update
) while keeping it composable (filter
part is usable for SELECT
and DELETE
queries too).We should publish some work in progress on typescript query builder soon, that may give more ideas on how to structure query builders.
@tailhook it was interesting, thanks. Maybe you have some time to look at #9 and the tutorial and give your opinion on the Python API there? Especially about the possibility of integrating this API with Rust.
and the tutorial and give your opinion on the Python API there?
Don't take it as a comprehensive review, as I'm limited in time. But here are few thoughts:
./app/repositories/edgedb/esdl/
, take a look at RFC1000 it describes what is supported by command-line tool. If you can adhere to this standard, it would be nice. If you can't, feel free to open an issue in the CLI repo, so that we have at least all the needed options to support this use case.email: Property[Optional[str]]
-- do you have to have Property
type here? I think it's possible to differentiate scalars from links by the type. And also User(..).email
should return a string, not any kind of property object.Property(default='xxx')
works. When object comes from EdgeDB it already contains the value. When you write object to EdgeDB you're supposed to skip that property and it will be filled by EdgeDB server-side. So I don't think it's needed in generated schemaUser.shape(User.id)
. I think it's better to have declarative shapes as data classes:
class HomePageUser(edm.Shape, source=User):
name: str
email: str
session.query(HomePageUser.filter_by(name="Guido"))
returns strongly typed list(HomePageUser)
(I'm not sure if python's generics are strong enough to infer the type by mypy, though)
session.create(photo)
-- it's probably better to keep terms the same: session.insert(photo)
, as the object is basically already created in memory. And we use CREATE
term for DDL (which isn't a part of the query builder, so isn't super-important though)Especially about the possibility of integrating this API with Rust.
Not sure what you mean. Do you mean implement the whole query builder in Rust and expose to Python? It's possible, but quite a bit of tedious work and maintenance.
Thank you for taking the time to do this.
edgedb.toml
in the root of the project, not sure). Property[T]
and placing just real type looks better. So the only custom type-hint as I see it for now will be required for multi links.create
is not right word and also not sure it should be a part of session. Probably it should be moved out from session as a separate function and than passed to something like session.execute
.
Hi! Cool project. I did something similar but only for python a few weeks ago (but haven't uploaded it yet and not sure if I will now), but my goal was to provide the correct type hints for conveniently using the return values from the driver with IDEs.
I have a few questions about what and how this project will do. Most of my questions are about Python (since this is my primary language).
NOTE: I'm not a native speaker, so if my questions sound rude - sorry, they shouldn't be, I'm just really interested in the answers.
Consider the following Python snippet from the README:
Questions:
Why do we need separate types such as
NamedType
,HasAddressType
orUserType
(and BTW.edm_user
module)?Why use decorators instead of extending classes like in other ORMs? This will add autocomplete to IDEs for the parent fields and it looks more pythonic(IMO). I understand that in Rust, you can simply expand all necessary fields through a macro, but for Python this method seems to me not suitable. Something like that:
...
class User(Named, HasAddress): ...
or
What about link properties definition? I had a couple of problems with them in my internal tool.
Will this project also be a query builder for EdgeDB? It would be great if so, but then there are also a couple more questions about types definition and general usage.
If this project is going to be a query builder, then it is not obvious how to use types as type hints for the values returned by the driver (in fact, it is not obvious to me even now). For example:
def my_func(conn: edgedb.BlockingIOConnection) -> None: user = select_user_by_id(conn, id=uuid.uuid4())