dmgolembiowski / edgemorph

EdgeDB manipulator of relational, polymorphic hierarchies
Other
16 stars 1 forks source link

Python ORM/FRM Design Standards #9

Closed dmgolembiowski closed 3 years ago

dmgolembiowski commented 3 years ago

@nsidnev, I want to entrust the Python user experience to you, and relinquish all decision-making on the Python API to you. I'm too distracted focusing on the Rust and Python API's, and it's becoming too much for me to think about on my own.

If you are feeling up to it, creating a Python-equivalent to the wiki's Rust document with your own ideas and opinions would be greatly appreciated.

P.S. The codeblocks script is an easy way to generate code in an HTML table, but not required by any means.

dmgolembiowski commented 3 years ago

Also, this is not ideal but if you want a way to view the AST as an in-memory string without so much noise (i.e. the <0x909229079> or TreeNodeChildid=... fields) there is a way to extract a less-dense AST tree in Python.

It takes some heavy regex to pull it off, but this provides a rough idea of the classes, namedtuples, maps, etc. that are needed behind the public API.

   with async_pool as pool:
        async_jobs = [ 
                pool.apply_async(compile, module_paths)
                for i in range(poolsize)
        ]
        try:
            async_res  = [ 
                    job.get(timeout=12)
                    for job in async_jobs
            ]
            results = copy(async_res)
        except TimeoutError:
            print("ERROR: Compilation workers timed out. Exiting.")
            timed_out = True
    if timed_out:
        sys.exit(1)
    print("varslist: async_pool, timed_out, results, async_jobs, async_res")
    serialized = [ 
            re.subn(r" at\s0x............", "",  str(serialize(tt, trim=True)))[0].replace(' ', '') 
            for tt
            in results ]
    print(serialized)
    # There are just too many fields to pay attention to.
    # This additional pass helps to clear away some of
    # the noise that only a machine could handle.

    pat = re.compile(r"(edb\.(([a-z]*(\.)+)*)|(TreeNodeChildid\=[a-zA-Z0-9]*,)|(TreeNodeid\=[0-9]*\,)|(Listid\=[0-9]*,))")
    reduced_ser = [ 
            re.subn(pat, "", tt)[0]
            for tt
            in serialized
    ]
    print(reduced_ser)
nsidnev commented 3 years ago

I want to entrust the Python user experience to you, and relinquish all decision-making on the Python API to you.

Thanks!

I've added an example tutorial on how I see Edgemorph use. This is different from what's shown in the current README.md file, so I'd like to hear your opinion on this.

Also in this "tutorial" there are some changes to the edgemorph.toml structure that IMO makes it easier to use EDM. Can you look at them too?

dmgolembiowski commented 3 years ago

Excellent; I'll look at it tonight!

dmgolembiowski commented 3 years ago

@nsidnev thanks again for preparing this. Here's my feedback if you're interested:

This will be helpful for storing revertible backups of the project schemas:

# autogenerated via EDM 0.1.7 at 2020-10-17 16:52:00 

I hadn't shared this with you yet, but as a future iteration of the edm executable, I want to incorporate Git-like commit and push subcommands so that it's possible to revert to a previous iteration of the database schema. While these would be better-suited to the official edgedb-cli, it doesn't hurt to take the initiative. :smile:


from ._app import (User as _User)
class User(_User):
    name: Property[Optional[str]] = Property(default="User")
    email: Property[Optional[str]]
    follows: MultiLink[User]

Regarding the User type, I like the way your Property implementation design supplies both __init__ and __annotations__ support.


@delegate(_get_longest_text)
def get_longest_text() -> Set[Text]:
    ...

This is a nice proposal. I have been struggling to come up with a way to idiomatically declare singletons functions in a given module. The get_longest_text() function's return type annotation reminds me that edgedb.Set exists in the official package, so we might be causing some name-conflicts, and unfortunately edgedb.Set does not support type annotations:

>>> from edgedb import Set
>>> class T:
...     @classmethod
...     def meth(cls, title: str) -> Set[str]:
...             return Set("T")
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in T
TypeError: 'type' object is not subscriptable

though the Python standard library does support this out of the box:

from typing import Set
from ._app import Text as _Text
class Text(_Text):
    title: Property[Optional[str]]
    body: Property[Optional[str]]

@delegate(_get_longest_text)
def get_longest_text() -> Set[Text]:
    ...

It doesn't matter right now, but eventually we'll want to plan for clients having two or three different Set class names in the global namespace. :smile:


from edgemorph import select, open_async_session
from starlette.requests import Request
from starlette.responses import JSONResponse

from app.repositories.edgedb.app import User

async def get_guido_users_route(_: Request) -> JSONResponse:
    statement = select(User.shape(User.id)).filter(User.name.ilike("Guido%"))
    async with open_async_session("instance-name") as session:
        result = await session.query(statement)

    return JSONResponse([user.id for user in result])

Looks great!


select(
    User.shape(
        User.name,
        User.email,
        follows=User.follows.shape(
            User.name,
            User.email,
        ),
    )
).filter(User.email == "alice@tutorial.com")

These kinds of ORM/FRM operations look great and make sense in the Python, but make me feel nauseous thinking about how to support them in Rust. It is challenging to write self-iterative data types (i.e. JSON objects) in Rust because of the compiler's strict borrowing rules. Supporting T.shape(...) will mean preparing more code like this and I don't enjoy it lol


from edgemorph.edgedb import len
        ...
        len_body=len(Text.body),  # behind the scenes it creates computable property
        ...

I think we should avoid overriding Python builtins at all costs, especially one as popular as len. Instead, I believe it is better to do behind the scenes overrides on the __len__ method so that it can appear on many different generic types resembling Text.body.


A couple of questions on:

photo = Photo(uri=uri, author=User())
await session.create(photo)

I like the simplicity here, but 1) do you think we can optionally support something like this instead:

photo = Photo(uri=uri, author=User())
fut = session.create(photo)
fut.add_done_callback(...)
await fut

and my other question: 2) At the atomic level, what assumptions are we making about this expression:

photo = Photo(uri=uri, author=User())

Is this client's code intentionally creating a new instance of the User type, or can we get away with simply passing in the User class alias? I can't reason about what User() is supposed to be doing.

dmgolembiowski commented 3 years ago

2) At the atomic level, what assumptions are we making about this expression:

photo = Photo(uri=uri, author=User())
await session.create(photo)

Is this client's code intentionally creating a new instance of the User type, or can we get away with simply passing in the User class alias? I can't reason about what User() is supposed to be doing.

As a follow up question, are we assuming that await session.create(photo) must recursively expand to something like:

_u = User(...)
photo = Photo(uri=uri, author=_u)
await session.create(photo, nested=True)

Or would we need to use a DETACHED statement that does the INSERT ahead of time using something like this:

_u = User(... , detached=True)
photo = Photo(uri=uri, author=_u)
await session.create(photo, nested=True)
nsidnev commented 3 years ago

but make me feel nauseous thinking about how to support them in Rust.

Yeah... well, I was thinking more about using in Python, not how difficult it would be to implement this in Rust. It will be necessary to think somehow more thoroughly how this can be implemented.


I think we should avoid overriding Python builtins at all costs, especially one as popular as len. Instead, I believe it is better to do behind the scenes overrides on the len method so that it can appear on many different generic types resembling Text.body.

Actually I was thinking about built-in EdgeQL functions here. This way they can all be included into Edgemorph and mapped to real functions. So, to use mean from math, you have to import edgemorph.edgedb.math, and Edgemorph will automatically resolve the call to math :: mean(<value>) when building the query.


await session.create(photo)

You know, now when I look at this query, I think it should be separate insert instead of session.create. this way all main verbs from EdgeQL(select, delete, insert, update) can be mapped correctly to functions from edgemorph.


do you think we can optionally support something like this instead:

photo = Photo(uri=uri, author=User())
fut = session.create(photo)
fut.add_done_callback(...)
await fut

I think it is possible, but why do we need it?


2) At the atomic level, what assumptions are we making about this expression:

Well, I was thinking about creating some instance of a model with fields with default values ​​(all required attributes must be passed, others will have missing or default values. Missing values ​​will then be handled by EdgeDB). So yes, I meant your first option, except I'm not sure what the nested keyword is supposed to do:

_u = User(...)
photo = Photo(uri=uri, author=_u)
await session.execute(insert(photo))
dmgolembiowski commented 3 years ago

@nsidnev just wanted to check in with you and let you know that I've been (and still am) busy with my day-job. Please reach out to me directly if you're encountering any blockers and need my immediate attention.