BloopAI / bloop

bloop is a fast code search engine written in Rust.
https://bloop.ai
Apache License 2.0
9.44k stars 573 forks source link

Query planer doesn't work for multiple alternations in a row #1267

Open hsqStephenZhang opened 6 months ago

hsqStephenZhang commented 6 months ago

High level details

I added a unittest of plan("[ab][cd][ef]"), which is a query parse testcase in google's code search project(I know it's pretty old), which is expected to expand into "ace, acf, ... bdf", but the test output failed to inline the query strings

Current situation

Proposal

I've implemented some changes, and have passed the test( plan("(a|b)(c|d)(e|f)(g|h)")) added by my self.

in server/bleep/src/query/planner.rs, The Fragment:::add need to handle the following match arm:

             // TODO: do we need to handle cases where the children are not all literals?
            (Fragment::Literal(lit), Fragment::Dense(Op::Or, mut rhs)) => {
                if rhs.iter().all(|x| matches!(x, Fragment::Literal(_))) {
                    for x in &mut rhs {
                        if let Fragment::Literal(s) = x {
                            *s = lit.clone() + s;
                        }
                    }
                    Fragment::Dense(Op::Or, rhs)
                } else {
                    Fragment::Dense(
                        Op::And,
                        vec![Fragment::Literal(lit), Fragment::Dense(Op::Or, rhs)],
                    )
                }
            }

            // TODO: avoid explosion by adding some constraints
            // TODO: do we need to handle cases where the children are not all literals?
            (Fragment::Dense(Op::Or, lhs), Fragment::Dense(Op::Or, rhs)) => {
                if lhs.iter().all(|x| matches!(x, Fragment::Literal(_)))
                    && lhs.len() <= 100
                    && rhs.iter().all(|x| matches!(x, Fragment::Literal(_)))
                    && rhs.len() <= 100
                {
                    let mut cross = Vec::with_capacity(lhs.len() * rhs.len());
                    for x in &lhs {
                        for y in &rhs {
                            let x = x.as_literal().unwrap();
                            let y = y.as_literal().unwrap();
                            cross.push(Fragment::Literal(x.to_owned() + y));
                        }
                    }
                    Fragment::Dense(Op::Or, cross)
                } else {
                    Fragment::Dense(
                        Op::And,
                        vec![Fragment::Dense(Op::Or, lhs), Fragment::Dense(Op::Or, rhs)],
                    )
                }
            }

and the function optimize::run should call flattern_or after the calling of inline

Next steps Are there any next steps after we implement these changes?

hsqStephenZhang commented 6 months ago

Given the content of 'acgf', the regex query "[ab][cd][ef]" will recall this doc, and it's unexpected, since this regex query has a determined set of matched trigrams.