Rework `CoreExpression::conditions`

aritmos commented 7 months ago

The current implementation of CoreExpression::conditions is based on nested iterators, this is not a great choice for two reasons

There is a lot of overhead in the nested iterators
It only allows for the propagation of the conditionals

The current work in 08ab4df has brought to light a simpler and better scenario where conditionals propagate back a richer type. This is done by propagating the checker in a DFS manner instead of flattening an expression's structure. In return the return states of the checker can be propagated back. This allows for the capture of

Errors, such as type mismatches
Error stack traces: adding useful information as to why a type mismatch occured in a complex scenario
Additional information: e.g. a list of all the columns used in the expression, useful for the flat-subquery feature that is to be added in the future.

aritmos commented 7 months ago

Proposal:

keep the expr-based checking to allow for easier DFS semantics
checker holds expression-validation state internally
checker validates single conditions at a time
context holds coercion information

Code Gist

Common core types

#[derive(Clone, Copy, PartialEq, Eq)]
pub enum ExprType {...}
pub enum Dialect {...}
pub enum Condition {...}

Checker

pub struct ExprInfo {
    // e.g.
    // - used column ids to know when to require a subquery
    cols: Vec<String>,
}
pub enum ExprErr {
    UnSealed(usize),
    Sealed(usize),
}

pub trait Checker {
    type Cond;
    type Ctx;
    fn validate(&mut self, cond: Self::Cond, ctx: Self::Ctx);
    fn flush(&mut self) -> Result<ExprInfo, ExprErr>;
}

if we error, then the ExprInfo shouldn't be needed and vice versa, hence an explicit error over a custom struct containing both fields
we keep conditions and context as associated types for idiomatic flexibility, however these are fixed by the type system (as can be seen in the following gists)
the error case has different states, allowing for additional related information to be added, or for the error to simply be passed along

Expressions

The checking functionality is moved into a Checkable trait

pub trait Checkable {
    type Cond;
    type Ctx: Clone;

    // returns references to children objects that need to be checked
    // (children can be of different types and might require other contexts)
    #[allow(clippy::type_complexity)] // no way around it
    fn children(
        &self,
        ctx: Self::Ctx,
    ) -> Vec<(
        &dyn Checkable<Cond = Self::Cond, Ctx = Self::Ctx>,
        Self::Ctx,
    )>;

    // returns the conditions needed to verify itself assuming children have already been checked
    fn conditions(&self, ctx: Self::Ctx) -> Vec<Self::Cond>;

    // checker is not an associated type as we want multiple checkers for the same expr
    fn check(&self, checker: &mut dyn Checker<Cond = Self::Cond, Ctx = Self::Ctx>, ctx: Self::Ctx) {
        // check children
        for (child, ctx) in self.children(ctx.clone()) {
            // if coerce is inside ctx, then each ctx needs to be possibly mutated per child
            child.check(checker, ctx);
        }

        // check self
        for cond in self.conditions(ctx.clone()) {
            checker.validate(cond, ctx.clone());
        }
    }
}

// fix checkable associated types throughout all expressions
pub trait Expression: Checkable<Cond = ..., Ctx = ...> {
    fn display(&self, dialect: Dialect) -> String;
}

Example Impl

// Context type
use std::{cell::RefCell, rc::Rc};
#[derive(Clone)]
pub struct Ctx {
    // shared information (immutable during `check`)
    // - e.g. current table selection
    tables: Rc<RefCell<() /* <- replace with some type */>>,
    // individual information (overridable during `check`)
    // - e.g. expression coercion
    pub coerce: ExprType,
}
impl Ctx {
    // Returns a clone of self with the given coercion
    pub fn set_coerce(&self, coerce: ExprType) -> Self {
        let mut ctx = self.clone();
        ctx.coerce = coerce;
        ctx
    }
}

// Faciliate CoreExpression coercion verification
macro_rules! debug_assert_coerce {
    ($self:expr, $ctx:expr) => {
        debug_assert!({ $ctx.coerce == ExprType::Any || $ctx.coerce == $self.eval_type() })
    };
}

pub struct GtExpr {
    lhs: Box<dyn Numeric>,
    rhs: Box<dyn Numeric>,
}
impl Checkable for GtExpr {
    type Cond = Condition;

    type Ctx = Ctx;

    fn children(
        &self,
        ctx: Self::Ctx,
    ) -> Vec<(
        &dyn Checkable<Cond = Self::Cond, Ctx = Self::Ctx>,
        Self::Ctx,
    )> {
        vec![
            (self.lhs.as_ref(), ctx.set_coerce(ExprType::Num)),
            (self.rhs.as_ref(), ctx.set_coerce(ExprType::Num)),
        ]
    }

    fn conditions(&self, ctx: Self::Ctx) -> Vec<Self::Cond> {
        debug_assert_coerce!(self, ctx);
        // no self conditions
        Vec::new()
    }
}

aritmos commented 7 months ago

Resolved by 5e9a7af

aritmos / crabql