[Discussion] alternative AST structure

lixitrixi commented 10 months ago

I mentioned this idea in a previous meeting but didn't fully expand on it. I'm not fully sure it's the way to go either, but I've been noticing it's rather tricky to expand our functionality with changes to the AST (especially as I'm working on reducing to constants).

I propose a different AST structure similar to the following:

enum Constant {
  Int(i32),
  Bool(bool),
}

enum Expression {
  Constant(Constant),
  // ...
  UnaryOp(UnaryOp, Expression),
  BinaryOp(BinaryOp, Expression, Expression),
  VecOp(VecOp, Vec<Expression>),
}

trait UnaryOp {}
trait BinaryOp {}
trait VecOp {}

This is only a rough sketch, but it could vastly cut down on the number of match branches we need.

We could also encode the operation directly into the type:

trait UnaryOp {
  fn apply(c: &Constant) -> Constant;
}

struct Not {}
impl UnaryOp for Not {
  fn apply(c: &Constant) -> Option<Constant> {
    match c {
      Bool(b) -> Some(!b),
      _ -> None,
    }
  }
}

...which could let us do funky things and abstractify typechecking nicely.

lixitrixi commented 10 months ago

@ozgurakgun @niklasdewally I would love to hear your feedback!

lixitrixi commented 10 months ago

We could of course also make the Operations an enum. That could look something like this:

enum UnaryOp {
  Not,
}

impl UnaryOp {
  fn apply(s: &Self, c: &Constant) -> Option<Constant> {
    match s {
      Not -> {
        // as above
      }
    }
  }
}

ozgurakgun commented 10 months ago

I suggest we keep the current data type and implement full eval for now.

Briefly: All in all, this won't reduce the number of match clauses, but will just move them around. It might still be worth doing, but we should discuss next time.

ozgurakgun commented 10 months ago

Also: what you suggest sounds like a halfway house between one big expression type that has many constructors (what we have now) and an expression trait together with several data types per kind of expression (this was also suggested during our last meeting).

niklasdewally commented 10 months ago

I agree that this is an issue, but i'm not sure splitting the AST up like this is the best way. In particular, how do we add expressions that take a reference and a constant, or a reference and two seperate lists of constraints and so on - I think there will be a lot of exceptions to these categories as they stand.

From a quick glance of the Minion source, there exists stuff like:

read_list,read_tuples
read_list, read_constant
read_constant_list, read_list, read_var
read_constraint_list

niklasdewally commented 10 months ago

@lixitrixi if the issue is "we want to enter all sub expressions"/ knowing where the nested expression types are/arn't, that would be a good use case for macros.

Using syn's [1] ItemEnum, you could generate a function that matches all cases of the ast, and does stuff with the ones that have fields of type Expression. Or those that have fields of type expression that are of some variant T.

I don't think we can neatly split things into categories, but we can use macros / their reflection abilities to do this for us!

1: syn provides a Rust AST for proc macro programming.

niklasdewally commented 10 months ago

An idea: we should have a function that just performs a "map" on all sub-expressions using a user-defined function. I think that could be useful in a lot of places.

I.e. we generate a big match of this form:

fn map_expr(e: Expression, f: fn(Expression) -> Expression) -> Expression {
    match(e) {
        Eq(a,b)  => Eq(f(a),f(b)) // Eq(Box<Expression>,Box<Expression>)
        Neq(a,b) => Neq(f(a),f(b)) // Neq(Box<Expression>,Box<Expression>)
        Or(exprs) => Or(exprs.iter().map(|e| => f(e)).collect()) // Or(Vec<Expression>)
        Constant(i) => Constant(i) // constants are not exprs, so do not map.
        // ... ...
    }
}

(the Haskellers here are undoubtedly familiar with the fact that almost everything can be expressed in terms of map and reduce)

niklasdewally commented 10 months ago

An example map function to evaluate references to a value:

fn ref_to_const(e:Expression,a:Name,b:Constant) -> Expression {
   let expr = match e {
     Ref(a) => Constant(b),
     _ => e,
  }
}
return map_expr(expr,|e| ref_to_const(e,foo,ConstantInt(2)));

niklasdewally commented 10 months ago

and an expression trait together with several data types per kind of expression (this was also suggested during our last meeting).

In general, I am in this camp - I think we could push a bit more onto the Rust type system, instead of doing our own with lots of matches. However, I do see that this could limit our typing later on (we would be somewhat coupled to Rust's typing)

lixitrixi commented 10 months ago

An idea: we should have a function that just performs a "map" on all sub-expressions using a user-defined function. I think that could be useful in a lot of places.

I really like this idea, it would fit nicely as a method of Expression too. I know you're not suggesting this as an alternative but the suggested rewrite would tie nicely into this as well:

fn map_expr(e: Expression, f: fn(Expression) -> Expression) -> Expression {
    match(e) {
        UnaryOp(op, a)  => UnaryOp(op, f(a))
        BinaryOp(op, a, b) => BinaryOp(op, f(a), f(b))
        VecOp(op, exprs) => VecOp(op, exprs.iter().map(|e| => f(e)).collect())
        Constant(i) => Constant(i)
        // ...
    }
}

And this would not have to be expanded basically at all.

ozgurakgun commented 10 months ago

An idea: we should have a function that just performs a "map" on all sub-expressions using a user-defined function. I think that could be useful in a lot of places.

this sort of thing reminds me of generic traversal libraries of haskell. uniplate is a good example: https://hackage.haskell.org/package/uniplate

See all methods defined in uniplate here if you want some inspiration: https://hackage.haskell.org/package/uniplate-1.6.13/docs/doc-index.html

There must be similar libraries in rust, but I never looked into them.

niklasdewally commented 10 months ago

An idea: we should have a function that just performs a "map" on all sub-expressions using a user-defined function. I think that could be useful in a lot of places.

this sort of thing reminds me of generic traversal libraries of haskell. uniplate is a good example: https://hackage.haskell.org/package/uniplate

See all methods defined in uniplate here if you want some inspiration: https://hackage.haskell.org/package/uniplate-1.6.13/docs/doc-index.html

There must be similar libraries in rust, but I never looked into them.

I cant find any, but we could write one!

We just need an additional easy layer of indirection. The code gen macro would take an enum name in as an argument, and generate a map function for it.

ozgurakgun commented 10 months ago

See https://fitzgeraldnick.com/2017/08/03/scrapmetal.html from a quick google.

niklasdewally commented 10 months ago

See https://fitzgeraldnick.com/2017/08/03/scrapmetal.html from a quick google.

You're better at googling than me it seems :) Scrap your boilerplate seems great!

ozgurakgun commented 10 months ago

oh I just knew the terms I needed to search - the paper is good as well, well worth a read!

niklasdewally commented 10 months ago

Another case of Haskell and SPJ being ahead of the curve.

ozgurakgun commented 10 months ago

by 2 decades in this case!

niklasdewally commented 10 months ago

@lixitrixi @ozgurakgun

Me and @gskorokhod were talking about something unrelated and ended up needing a general abstraction over traversal elsewhere.

The problem is that of empty Vec in constraints.

We have encountered a scenario that needs rules that have And([Or([]]) and reduces it to And([]). (Maybe through an intermediate None type).

niklasdewally commented 10 months ago

@Kieranoski702 I suppose you would've done some of this traversal logic for the rewriting system? We could generalise that to be useful in rules too?

Georgii's problem seems to be similar to the existing sub_expressions function (written by @Kieranoski702), maybe this could also be generalised further a' la Scrapping Boilerplate? This seems to be a reduce?

ozgurakgun commented 10 months ago

I've been looking at the scrapmetal crate a bit, looks viable.

lixitrixi commented 10 months ago

I think I will go ahead and play around with this in another PR once I've done constant rewriting, to see if it actually makes things nicer.

ozgurakgun commented 10 months ago

I've been looking at the scrapmetal crate a bit, looks viable.

So, option (1) scrapmetal

(2) We should also consider: https://crates.io/crates/lens-rs (also see https://www.reddit.com/r/rust/comments/mbisnt/lensrs_guide/)

(3) Was there another lenses/optics library @niklasdewally had found?

(4) implementing our own uniplate style methods

niklasdewally commented 10 months ago

I've been looking at the scrapmetal crate a bit, looks viable.

So, option (1) scrapmetal

(2) We should also consider: https://crates.io/crates/lens-rs (also see https://www.reddit.com/r/rust/comments/mbisnt/lensrs_guide/)

(3) Was there another lenses/optics library @niklasdewally had found?

(4) implementing our own uniplate style methods

Nope, it was lens-rs.

What is the difference between lenses and scrapmetal in this case?

I like scrap your boilerplate and scrapmetal; I think it would be useful!

ozgurakgun commented 10 months ago

the lens-rs library implements traversals too, so might have comparable functionality

niklasdewally commented 10 months ago

I've been looking at the scrapmetal crate a bit, looks viable.

So, option (1) scrapmetal

(2) We should also consider: https://crates.io/crates/lens-rs (also see https://www.reddit.com/r/rust/comments/mbisnt/lensrs_guide/)

(3) Was there another lenses/optics library @niklasdewally had found?

(4) implementing our own uniplate style methods

Scrapmetal will tie us to nightly Rust and an unstable feature (specialisation). I personally don't think this is an issue, but its definitely a consideration.

niklasdewally commented 10 months ago

Dtolnay seems to have found a way to do some specialization on stable, if we end up needing it to reimplement some of SYP: https://github.com/dtolnay/case-studies/tree/master/autoref-specialization

niklasdewally commented 10 months ago

There also is "Scrap your Zipper".

I think the difference between syb vs syz and lenses is at once traversal vs incremental walking? I think we need some of both.

niklasdewally commented 10 months ago

Made a proper issue for traversal (#177) as it keeps coming up!

lixitrixi commented 10 months ago

I have definitely thought about this an unhealthy amount in the past few days... I will take a step back as this is definitely not critical. Also I agree that a zipper/uniplate is the way to go for solving tree traversal! Looking forward to what @niklasdewally cooks up :rocket:

I will however leave some thoughts which I had over the last few days on the alternative structure below:

Main advantage

The main function is to separate the concept of an operation from the actual operands. This separation allows us to work with the operations and apply them to separate types if we want, like Constant. See below:

impl BinaryOp {
    fn eval(&self, a: &Constant, b: &Constant) -> Option<Constant> {
        match (self, a, b) {
            (Eq, Constant::Int(a), Constant::Int(b)) => Some(Constant::Bool(a == b)),
            (Eq, Constant::Bool(a), Constant::Bool(b)) => Some(Constant::Bool(a == b)),
        }
    }
}

The above is not possible to that level of clarity with our current AST structure. Since the constructor contains expressions they must first be unwrapped into constants, an Option operation. Since constructors of different arities are mixed in with the AST, we can't match them easily if we do unwrap the expressions first. I.e. we won't be able to match the variant constructors themselves as they have non-consistent types.

Internalizing operations

This refactor moves evaluation of operands into the <Arity>Op types, which gives us a more powerful interface when working with expressions from the outside. Granted, this could be achieved in my current PR #175 by making the eval_constant function a method of Expression, but doing it using the refactor would make this significantly easier. It also makes typechecking of constant evaluation easier. (I'm really embracing Haskell matching here)

Below is a fuller example of what I mean, using (a minimal version of) the constant evaluation function I have been working on. Both snippets assume the AST is already set up correctly.

Current AST (at least how I implemented it) (I know the Eq rule here doesn't allow for equality between booleans, but I have a solution and will add it to #175 soon)

pub fn eval_constant(expr: &Expr) -> Option<Const> {
    match expr {
        Expr::Constant(c) => Some(c.clone()),
        Expr::Reference(_) => None,
        Expr::Eq(a, b) => bin_op::<i32, bool>(|a, b| a == b, a, b).map(Const::Bool),
        _ => None,
    }
}

fn bin_op<T, A>(f: fn(T, T) -> A, a: &Expr, b: &Expr) -> Option<A>
where
    T: TryFrom<Const>,
{
    let a = unwrap_expr::<T>(a)?;
    let b = unwrap_expr::<T>(b)?;
    Some(f(a, b))
}

fn unwrap_expr<T: TryFrom<Const>>(expr: &Expr) -> Option<T> {
    TryInto::<T>::try_into(eval_constant(expr)?).ok()
}

With rewrite

impl BinaryOp {
    fn eval_const(&self, a: &Constant, b: &Constant) -> Option<Constant> {
        match (self, a, b) {
            (Eq, Constant::Int(a), Constant::Int(b)) => Some(Constant::Bool(a == b)),
            (Eq, Constant::Bool(a), Constant::Bool(b)) => Some(Constant::Bool(a == b)),
            _ => None, // We can change this to a Result type to bubble up type errors...
            // ...which is more difficult to do with the current AST
        }
    }
}

impl Expression {
    fn eval(&self) -> Option<Constant> {
        match self {
            Expression::Constant(c) => Some(c.clone()),
            Expression::Reference(_) => None,
            Expression::BinaryOp(op, a, b) => Some(op.eval_const(&a.eval()?, &b.eval()?)),
        }
    }
}

@ozgurakgun I remember hearing you had tried something similar in Conjure and it ended up being a pain. Do you think this will pose a problem further down the line, or will additional tools like Uniplate avoid that?

gskorokhod commented 10 months ago

@lixitrixi I actually really like the proposed new AST structure! But also, I see how it could become a problem as we add more different classes of expressions. I think it may still be manageable with proper planning though? I.e at the moment all expression types that we have boil down to unary (Not), binary (Eq and minion flat constraints), ternary (Ineq) and vector (Sum, And, etc). So there isn't a whole lot of special cases to handle.

We definitely should discuss this in more detail at the meeting though! And ultimately Oz would know better as he has done this before.

(Also - the new AST structure isn't mutually exclusive with implementing uniplate, right?)

lixitrixi commented 10 months ago

@gskorokhod

I.e at the moment all expression types that we have boil down to unary (Not), binary (Eq and minion flat constraints), ternary (Ineq) and vector (Sum, And, etc). So there isn't a whole lot of special cases to handle.

They should currently reduce to the following: Unary, Binary, Ternary, Vector, and Flat. BinaryOp cannot represent Minion flat constraints as they are of type Vec<Expression>, Expression, not Expression, Expression.

But also, I see how it could become a problem as we add more different classes of expressions. I think it may still be manageable with proper planning though?

I see your point! I tend to think organizing them by signature is more maintainable in the long run than our current structure. When adding a new expression type, chances are we just have to add a single variant to an existing enum for that type, and add rules on how it combines constants, without having to also make traversal rules for it.

If we do have to add a whole new type then all we need to worry about is the specific types of arguments (i.e. Vec, Expression) and how we loop over them in a structural sense. Once uniplate is added we won't have to worry about this at all, just add a new variant and let the macro deal with how it is structured.

Essentially this will separate traversal from application, and letting uniplate take care of the former we have to do barely anything to add new types. With our current structure we will always have to worry about AST structure, as it is inherently linked to the actual operators. E.g. to do anything with Eq, you must also deal with its two operands.

(Also - the new AST structure isn't mutually exclusive with implementing uniplate, right?)

Definitely not! I think they will complement each other rather well. The refactor will reduce the number of cases uniplate has to write for tree traversal, and uniplate will provide the generic traversal functionality that the refactor lacks out of the box.

conjure-cp / conjure-oxide

[Discussion] alternative AST structure #173

Main advantage

Internalizing operations