Closed dmgolembiowski closed 3 years ago
@tailhook, do you have any suggestions on this topic:
Implementation Considerations
It can be difficult to write a parser for any complex serialized AST where all types must have known sizes at compile time. While a "TT (token tree) Muncher" seems like a viable option, the practicality of a TT muncher at this scale is brutal. The sheer volume of terms and tokens to match (or discard) makes this difficult to maintain. A more suitable approach would be to write the deserializer with some formal grammar modularity. My preference leans toward PEG and Pest.
Regardless of the approach taken for the intermediate deserialization step, Edgemorph will either need to create the innermost leaf nodes and run
to_owned()
once inside their owner's::new(...)
method, or Edgemorph could adapt the builder methods indatastructures.rs
to stitch distinct nodes together within their owners. Lastly, Edgemorph could operate upon the AST and match against a giantenum
-like structure to allocate each of the codegen Rust types.
I'd rather know ahead of time if I'm entering the danger zone
ToDo: Revise pre-RFC to include multi-module validation using EdgeQL introspection. In particular, referencing outside user-defined SDL modules within another SDL module is not checked by the ql_parser.
@tailhook, do you have any suggestions on this topic:
I'm not sure I understand matters here. But have you considered using edgedb introspection for codegen?
I.e. you apply the schema into the edgedb instance and then execute queries. Here is how you can get all the properties of the User
object:
SELECT schema::ObjectType {
properties: {name}
}
FILTER .name = 'default::User';
More docs here: https://www.edgedb.com/docs/edgeql/introspection/objects
We are going to rewrite EdgeQL parser in Rust at some point, so tying your implementation to specific python AST now might introduce more churn than needed.
We are going to rewrite EdgeQL parser in Rust at some point, so tying your implementation to specific python AST now might introduce more churn than needed.
Excellent; Yury also shared some useful points with me in another conversation related to this pre-RFC. Introspection seems to be the way to go, so I'll table this issue until the Rust EdgeQL parser is stable. Thank you!
Have you considered applying the schema into the live EdgeDB and then using introspection for getting type info for making codegen? Or do you use edgedb submodule for different purposes?
No, I hadn't considered that. Thanks for the suggestion; it may come in handy once I begin the
edm
port to rust. I chose this approach because back when alpha-1 or alpha-2 and I was just getting started, I realized it wasn't convenient enough to check syntax for validity unless you were already on the CLI. At those earlier versions, the error reporting was less refined than it is now, so debugging was a chore. But for compilation I'm using the edgedb.edb submodule to deserialize a user's module files into AST so that only valid schemas can be "installed". For the time being, it's a smelly to hack the non-public submodule into edgemorph, but it's only temporaryOh, I've seen this comment after commenting about the same in another issue. You may disregard that comment. Originally posted by @tailhook in https://github.com/dmgolembiowski/edgemorph/pull/8#issuecomment-717112607
Type introspection should work in concert with markup serialization and EDB QL parsing.
Reopening this pre-RFC with the knowledge that QL parsing is already provided by the EdgeDB submodule via:
from edb.common.markup import _serialize as serialize
from edb.edgeql import parser as qlparser
(p.s. Yury, thank you for the Gist. I'll definitely incorporate it into this RFC.)
Abstract
We propose to use ESDL abstract syntax trees as the basis for code generation in the Edgemorph framework.
Motivation
To Edgemorph, EdgeDB's compiler is the root of all magic. By digesting an SDL module into AST tokens, we share a common tongue between any supported programming language. To this end, it is reasonable to use a strongly-typed programming language like Rust or TypeScript to build boilerplate code structures from a user's schema definitions.
Warning: Do not stand in front of the EdgeDB bullet train.
Since EdgeDB's capabilities continue to rapidly evolve, and since its SDL language continues to mature in ways that enrich each user's experience, it becomes imperative for Edgemorph to jump out of the way and trek behind — following the smokestacks. We do this by capturing abstract syntax structures during the
edm make
process, and disk-cache them for interpretation duringedm make install
.We believe this approach offers the greatest amount of backward and forward compatibility between successive EdgeDB version releases — because each tag (i.e. alpha-3, alpha-4, ... ) will correspond to its own variant of AST deserialization requirements. Moreover, whenever EdgeDB announces a new release, e.g. version alpha-N, AST changes resulting from alpha-N's release will only need to be developed on a fork of the latest Edgemorph edition (the one corresponding to EdgeDB version alpha-N - 1 ).
To be clear, this is a high-effort, high-maintenance approach but the tradeoff is guaranteed backwards compatibility with EdgeDB.
Type Specifications
The purpose of this RFC's Abstract Specifications section is to identify generic templates that will be coded in Rust. For example, the serialized abstract syntax below must have each of its fields meaningfully converted into a Rust type at compile time. (Note: The following list of types is not complete, but it does cover the most common AST token kinds.)
Example of a serialized module's abstract syntax tree
TreeNode
id
:<i32>
name
: NT such that NT ∈ T ´ and T ´ satisfies the size requirements for each of the following identifiers. :children
:CheckedList<TreeNodeChild, Markup>
TreeNodeChild
id
:Optional<i32>
label
:String
s, such that s ∈ L ={ "name", "target", "maintype" }
node
:enum <String ; TreeNode; List >
, with the following corollaries:node::String
→<String str = '%s'>
;node::TreeNode
→&'a Sized<RefCell<Weak<TreeNode<'a>>>>
. 'a is the lifetime specifier for theTreeNode
it ellides.Sized<T>
is a type with known size.RefCell<T>
is a mutable memory location with dynamically checked borrow rules1.Weak<T>
is a pointer that holds a non-owning reference to the managed allocation2.TreeNode
is an EdgeDB language markup base object subtype.Schema
declarations
:Vec<ModuleDeclaration>
ModuleDeclaration
name
:ObjectRef<&str>
declarations
:Vec<Declaration>
Declaration
CreateAlias
:CreateObjectType
:CreateFunction
:BinOp
left
:Expr
op
:String
right
:String
Implementation Considerations
It can be difficult to write a parser for any complex serialized AST where all types must have known sizes at compile time. While a "TT (token tree) Muncher" seems like a viable option, the practicality of a TT muncher at this scale is brutal. The sheer volume of terms and tokens to match (or discard) makes this difficult to maintain. A more suitable approach would be to write the deserializer with some formal grammar modularity. My preference leans toward PEG and Pest.
Regardless of the approach taken for the intermediate deserialization step, Edgemorph will either need to create the innermost leaf nodes and run
to_owned()
once inside their owner's::new(...)
method, or Edgemorph could adapt the builder methods indatastructures.rs
to stitch distinct nodes together within their owners. Lastly, Edgemorph could operate upon the AST and match against a giantenum
-like structure to allocate each of the codegen Rust types.References