A few questions about future

Hi! Cool project. I did something similar but only for python a few weeks ago (but haven't uploaded it yet and not sure if I will now), but my goal was to provide the correct type hints for conveniently using the return values from the driver with IDEs.

I have a few questions about what and how this project will do. Most of my questions are about Python (since this is my primary language).

NOTE: I'm not a native speaker, so if my questions sound rude - sorry, they shouldn't be, I'm just really interested in the answers.

Consider the following Python snippet from the README:

from edgemorph import ( edgetype, property, link, multi )
from .edm_user import ( NamedType, HasAddressType, UserType )

@edgetype(abstract=True, edb=NamedType)
class Named:
    name: property[str]

@edgetype(abstract=True, edb=HasAddressType)
class HasAddress:
    address: property[str]

@edgetype(extending=(Named, HasAddress), edb=UserType)
class User:
    friends: multi[ link[__qualname__] ]
    index:   {
        "name": lambda title : "User name index"
    }

Questions:

Why do we need separate types such as NamedType,HasAddressType or UserType (and BTW .edm_user module)?
Why use decorators instead of extending classes like in other ORMs? This will add autocomplete to IDEs for the parent fields and it looks more pythonic(IMO). I understand that in Rust, you can simply expand all necessary fields through a macro, but for Python this method seems to me not suitable. Something like that:
```
class Named(edgemorph.Type):
__abstract__ = True  # or something like "class Meta" from Django 
...
```

...

class User(Named, HasAddress): ...


3. Why not move all the metadata such as index and annotation definitions into something like the `Meta` class, like Django or Tortoise-ORM does. Or use descriptors for some field definitions (as done in Pydantic and SQLAlchemy) and do something like this:
```python3
@edgetype(extending=(Named, HasAddress), edb=UserType)
class User:
    ...

    class Meta:
        indieces = ("name",)

@edgetype(extending=(Named, HasAddress), edb=UserType)
class User:
    ...
    name: Property[str] = Property(index=True)

What about link properties definition? I had a couple of problems with them in my internal tool.
Will this project also be a query builder for EdgeDB? It would be great if so, but then there are also a couple more questions about types definition and general usage.
If this project is going to be a query builder, then it is not obvious how to use types as type hints for the values returned by the driver (in fact, it is not obvious to me even now). For example:
```
def select_user_by_id(conn, *, id) -> User:
...
```

def my_func(conn: edgedb.BlockingIOConnection) -> None: user = select_user_by_id(conn, id=uuid.uuid4())

# not sure if linters and IDEs will understand that the .name property is str
print("username starts with 'test: {}`.format(user.name.startswith("test")))



6. Do you need any help with it? I'll be glad to help :)

Great questions all around, and thank you for being interested in Edgemorph! Before I dive into answering them, I would like to tell you a little bit about how I arrived at Edgemorph's architectural design and why it makes sense to me.

Originally, I wanted to make a simple query builder in EdgeDB: the kind where I write some code in either Python or Rust, and the library spits out the appropriate query statements to run on the database. I cooked up a prototype using a Flask server via Jinja templates, then parsed out the weird UTF-8 characters using Regex. And it worked!

At that point I thought I was done, and I would publish it as a library package called deltalib. But then something happened. EdgeDB released new language features and I liked what they accomplished, so I took a backup of my database and upgraded to the latest version, then tried to commit all of my modules to the new database -- naively expecting everything to just work with some minor tweaks.

That didn't happen. Instead, I lost all of my data because I was too stubborn to drop the version, and I wasn't smart enough to figure out how to serialize all of the data in the dump format, much less put it into something useful like JSON from that dump file.

It wasn't until 4-5 weeks ago when I had an epiphany that changed the way I look at this data hierarchy problem. And that realization was:

EdgeQL is going to continue to evolve, so why try to fight it? Even if I'm extraordinarily successful and a future iteration of deltalib takes off, there's a chance that Yury and Elvis will have to work even harder just to maintain ORM compability because they don't want to break the community's code; and that jeopardizes the quality of EdgeQL -- my favorite part of the EdgeDB product.

I realized that there is a fundamental incompability between an object-relational mapper and a powerful query language like EdgeQL. Not only do they both try to enforce structure over the same source using different patterns, but the ORM is always limited to producing a monotonically decreasing subset of the actual query language! As a result, doing things the old fashioned way with a standard ORM is only going to limit EdgeDB. Going forward, it would be necessary to rethink my strategy.

If an ORM is fundamentally wrong for EdgeDB, then what's right?

Well, EdgeQL feels right. How do I do more EdgeQL, but still keep the good things that an ORM does?

This puts us at a decent place to begin discussing your questions.

Questions:

Why do we need separate types such as NamedType,HasAddressType or UserType (and BTW .edm_user module)?

Great question! .edm_<module name> is the shared object target that CPython can use to bind Rust code. If you're curious about how this works, try poking around rust-to-python for a more interactive explanation. Using Rust builder macros, Edgemorph is generating custom library code based upon the EdgeQL structures we've parsed from user.edgeql and templated them into statically-typed Rust struct's. Each of the custom EdgeQL types (Named, HasAddress, and User) will derive implementations in Rust-land for calling the correctly-named database queries. Each of these prepared statements are dynamically formed when running edm make install. edm is the binary executable written in Rust that is responsible for reading EdgeQL module files, commiting their SDL to the database, and as part of that process -- injecting edgemorph methods into the module for object insertion, filter via select, object updates, etc.

We need separate names such as NamedType, HasAddressType, and UserType for reaching these Rust bindings. Additionally, I think it's a good idea to not name them Name, HasAddress, and User because it disambiguates them from the names called in client code. This may come as a surprise but NamedType, HasAddressType, and UserType are not actually types at all! Each of these are names for communicating with a type implementation of the things we want Named, HasAddress, and User to be able to do in a client's Python or Rust code. I went with Type as a suffix instead of Impl because it's more readable for me.
Why use decorators instead of extending classes like in other ORMs? This will add autocomplete to IDEs for the parent fields and it looks more pythonic(IMO). I understand that in Rust, you can simply expand all necessary fields through a macro, but for Python this method seems to me not suitable.

Good point. I like the design of Python 3.8's dataclasses. They look more like EdgeQL types than Python classes, which was my goal. You're completely right, though. That implementation is more Pythonic, and probably makes more sense to implement.
Why not move all the metadata such as index and annotation definitions into something like the Meta class, like Django or Tortoise-ORM does. Or use descriptors for some field definitions (as done in Pydantic and SQLAlchemy) and do something like this:
```
@edgetype(extending=(Named, HasAddress), edb=UserType)
class User:
...

class Meta:
    indieces = ("name",)
```
or
```
@edgetype(extending=(Named, HasAddress), edb=UserType)
class User:
...
name: Property[str] = Property(index=True)
```

Another good point! I struggled on a design specification for index and annotation definitions. While I'm not fond of nested Meta classes, the default property example looks quite elegant.

What about link properties definition? I had a couple of problems with them in my internal tool.

Agreed; these require some serious black magic since creating a TypeVar doesn't do representational justice when printing ClassType.__annotations__. I have a temporary shim in place that aliases link with an optional union, but this cannot stay.
Will this project also be a query builder for EdgeDB? It would be great if so, but then there are also a couple more questions about types definition and general usage.

Does my explanation for your first question cover this well enough?
If this project is going to be a query builder, then it is not obvious how to use types as type hints for the values returned by the driver (in fact, it is not obvious to me even now). For example:

Welp. The short answer is that edgedb.<prefix>Connection's just won't work for us. It means that we will have to closely study the edgedb-protocol Rust crate, and integrate it with the process I described in the first answer.
Do you need any help with it? I'll be glad to help :)

Of course! To anyone who wants to join the adventure, please send your Github username to [mailto:david@dgolembiowski.com] and I'll add you to the project.

Thanks for the explanation! Most of my questions have been answered, but I still don't fully understand some things:

Could you show an example of using this library (framework?) with Python? How you want it to be used when the project is ready.
Welp. The short answer is that edgedb.Connection's just won't work for us. It means that we will have to closely study the edgedb-protocol Rust crate, and integrate it with the process I described in the first answer.

Um... does this mean that edgemorph will use its own connections under the hood? And reimplement them for Python and other supported languages in the future?

You're welcome!

Could you show an example of using this library (framework?) with Python? How you want it to be used when the project is ready.

Sure thing! It'll generally go like this:

# Python usage
user@computer $ python -m venv app && cd app
user@computer $ edm init .
user@computer $ source bin/activate
user@computer $ pip install wheel edgemorph edm && pip freeze > requirements.txt
user@computer $ cd edb_modules
user@computer $ # Use an editor to add SDL to `./mod_app.edgeql`
user@computer $ # Verify the output path for `edgemorph_output.python` in `../edgemorph.toml`

See the edm specs and control flow graph and feel free to point out any weaknesses you can spot. Submitting a PR for the weaknesses awards you extra-credit.

user@computer $ # Run `edm make` to see if all the EdgeQL passes a grammar check
user@computer $ edm make mod_app.edgeql

At this point, edm calls in the compiler submodule from edgemorph and begins generating the AST tree from the supplied EdgeQL source file(s). If the grammar is correct, edm will serialize the EdgeQL module and write an intermediate file in the modules directory with instructions for edm make install to deserialize.

user@computer $ # If everything's correct in the client's module file, then update `edgemorph.toml` with the appropriate DSN
user@computer $ cd .. && edm make install

At this point edm make install has no knowledge of whether edm make just ran, so it has to check the modules folder indicated in edgemorph.toml for hints about pre-serialized EdgeQL targets. If the supplied (or implied) EdgeQL target does not have a serialized file in the edb_modules directory, it will automatically run edm make <that module file>. Otherwise, it will proceed and pull in the edgedb-protocol crate to connect with the server indicated in the toml file. edm will use edgedb-protocol::client_message::ClientMessage to start a client handshake, authenticate, encode the following: start a transaction, create a migration, commit the migration, end the transaction. Then send the encoded message to the server and wait for the response. Once the response is received, begin printing everything important to STDOUT. Next, edm make install will deserialize the module and pull in the edgemorph library builder, so that it can eject a client library target: edm_<edgeql mod>.so, and also eject <edgeql mod>.py.

Remark: <edgeql mod>.py can be modified directly by the user because it technically will be a fully-functioning Python file, but it's really just there for exposing callable names, class names, etc. from edm_<edgeql mod>.so to the neighboring Python files.

Welp. The short answer is that edgedb.Connection's just won't work for us. It means that we will have to closely study the edgedb-protocol Rust crate, and integrate it with the process I described in the first answer.

Um... does this mean that edgemorph will use its own connections under the hood? And reimplement them for Python and other supported languages in the future?

Yes, at least for the schema migration part. Until edgedb-rust supports tokio, the semi-equivalent of MagicStack's uvloop for Python, edgemorph will have to only support Blocking IO connections under the hood. This includes all of the ORM-like bindings, too. :broken_heart: We have the technology. It's time to learn the async-std library and write database connection pools (or just borrow @tailhook's work ) . But in the future, I'm hopeful it will be possible to expose SQLAlchemy-like sessions using PyO3's python! { ... } macro, which can acquire a Python GIL and leverage inline edgedb-python connections.

Yes, at least for the schema migration part. Until edgedb-rust supports tokio, the semi-equivalent of MagicStack's uvloop for Python, edgemorph will have to only support Blocking IO connections under the hood. This includes all of the ORM-like bindings, too. broken_heart

Why do you need tokio specificaly? Edgedb-rust/edgdb-client works on top of async_std, which is an async framework quite similar to tokio but having more stdlib-like APIs.

Yes, at least for the schema migration part. Until edgedb-rust supports tokio, the semi-equivalent of MagicStack's uvloop for Python, edgemorph will have to only support Blocking IO connections under the hood. This includes all of the ORM-like bindings, too. broken_heart

Why do you need tokio specificaly? Edgedb-rust/edgdb-client works on top of async_std, which is an async framework quite similar to tokio but having more stdlib-like APIs.

@tailhook, Yeah... you've got me there. After looking into the latest gadgets available in async-std, I've realized that I'm completely wrong.

@nsidnev, @tailhook, @1st1, @CodesInChaos could I have feedback on the designs for the client Rust API? I've spent considerable time designing a compact and thoughtful representation for querying and mutating data. For clients using this framework, preparing code follows this general process:

"What type am I operating upon?"
"Which of the type's fields are relevant to this database call?"
"What operation(s) am I performing?"
"Is the operation blocking or non-blocking?"

For example, a demo using .update():

"What type am I operating upon?" User
"Which of the type's fields are relevant to this database call?"
- name, where name = "Alice" but it needs to be changed to "Alice in Wonderland"
"What operation(s) am I performing?"
- UPDATE
- SET
"Is the operation blocking or non-blocking?"
- non-blocking

So, the optimal client code resembles:

use crate::edm_app::*;
let mut user = User::edb()
        .name("Alice")
        .update()
        .set(vec![("name", "Alice in Wonderland")]
        .no_block().await?;

From my design research, I think a .filter() method is mostly unnecessary since FILTER operations can be inferred from the values passed into the edb builder fields, but I'm not ready to close off the idea of using filter() since it may be necessary in some corner cases.

In the Rust wiki, I'm building up a document with a large collection of examples to promote API-first design. As always, your questions are welcome.

Oh, sorry I've missed that. My quick input (athough I may be wrong on some things):

You don't have to have no_block().await?. You can have Future impl for all query builder structures, so .await is enough. (Unless you want to specify connection where to fetch from). See surf crate for an example of doing that.
set with a vector is unlikely to work, because of property types. .update() should probably return some UpdateUser object that has set_name and other set_* methods.
I would probably structure it in reverse order: User::update(User::filter().name("Alice")).set_name("Alice in Wonderland"). This gives more stress on operation being done (update) while keeping it composable (filter part is usable for SELECT and DELETE queries too).

We should publish some work in progress on typescript query builder soon, that may give more ideas on how to structure query builders.

@tailhook it was interesting, thanks. Maybe you have some time to look at #9 and the tutorial and give your opinion on the Python API there? Especially about the possibility of integrating this API with Rust.

and the tutorial and give your opinion on the Python API there?

Don't take it as a comprehensive review, as I'm limited in time. But here are few thoughts:

Directory structure: ./app/repositories/edgedb/esdl/, take a look at RFC1000 it describes what is supported by command-line tool. If you can adhere to this standard, it would be nice. If you can't, feel free to open an issue in the CLI repo, so that we have at least all the needed options to support this use case.
Model declaration: email: Property[Optional[str]] -- do you have to have Property type here? I think it's possible to differentiate scalars from links by the type. And also User(..).email should return a string, not any kind of property object.
Not sure how Property(default='xxx') works. When object comes from EdgeDB it already contains the value. When you write object to EdgeDB you're supposed to skip that property and it will be filled by EdgeDB server-side. So I don't think it's needed in generated schema
User.shape(User.id). I think it's better to have declarative shapes as data classes:
```
class HomePageUser(edm.Shape, source=User):
   name: str
   email: str
```
session.query(HomePageUser.filter_by(name="Guido")) returns strongly typed list(HomePageUser) (I'm not sure if python's generics are strong enough to infer the type by mypy, though)
session.create(photo) -- it's probably better to keep terms the same: session.insert(photo), as the object is basically already created in memory. And we use CREATE term for DDL (which isn't a part of the query builder, so isn't super-important though)

Especially about the possibility of integrating this API with Rust.

Not sure what you mean. Do you mean implement the whole query builder in Rust and expose to Python? It's possible, but quite a bit of tedious work and maintenance.

Thank you for taking the time to do this.

It would be nice to allow changing of the defaults, perhaps through some kind of config file, as it's done in Alembic for SQLAlchemy. I'll add a new issue with some thoughts (something like edgedb.toml in the root of the project, not sure).
Oh, yeah, you right! Ommiting Property[T] and placing just real type looks better. So the only custom type-hint as I see it for now will be required for multi links.
Hmm, probably you're right again. As EdgeDB's schema contains all required information and is readable enoght it's not required.
Nice way! But I'm not sure, that it can feet in all cases, for example, where shape is generated dynamicly. But it's IMO a good approach to write strong typed queries (implementing typing for that shouldn't be very hard, a small POC for this case was created easily in 30 lines of code).
Yeah... I also think that create is not right word and also not sure it should be a part of session. Probably it should be moved out from session as a separate function and than passed to something like session.execute.

dmgolembiowski / edgemorph

A few questions about future #1