I added a unittest of plan("[ab][cd][ef]"), which is a query parse testcase in google's code search project(I know it's pretty old), which is expected to expand into "ace, acf, ... bdf", but the test output failed to inline the query strings
Current situation
how it's insufficient: After reading the code, I found that current optimizer won't handle the scenario that has over 3 alternations in a row. I'm not sure if it's in anticipation
motivations: I'm learning code search techs and comparing google's code search engine's with bloop, the boolean query builder seems to have some diffs, so I'm figuring it out.
why we need to change this: To make the query optimization more sufficient and cover more cases.
Proposal
I've implemented some changes, and have passed the test( plan("(a|b)(c|d)(e|f)(g|h)")) added by my self.
in server/bleep/src/query/planner.rs, The Fragment:::add need to handle the following match arm:
// TODO: do we need to handle cases where the children are not all literals?
(Fragment::Literal(lit), Fragment::Dense(Op::Or, mut rhs)) => {
if rhs.iter().all(|x| matches!(x, Fragment::Literal(_))) {
for x in &mut rhs {
if let Fragment::Literal(s) = x {
*s = lit.clone() + s;
}
}
Fragment::Dense(Op::Or, rhs)
} else {
Fragment::Dense(
Op::And,
vec![Fragment::Literal(lit), Fragment::Dense(Op::Or, rhs)],
)
}
}
// TODO: avoid explosion by adding some constraints
// TODO: do we need to handle cases where the children are not all literals?
(Fragment::Dense(Op::Or, lhs), Fragment::Dense(Op::Or, rhs)) => {
if lhs.iter().all(|x| matches!(x, Fragment::Literal(_)))
&& lhs.len() <= 100
&& rhs.iter().all(|x| matches!(x, Fragment::Literal(_)))
&& rhs.len() <= 100
{
let mut cross = Vec::with_capacity(lhs.len() * rhs.len());
for x in &lhs {
for y in &rhs {
let x = x.as_literal().unwrap();
let y = y.as_literal().unwrap();
cross.push(Fragment::Literal(x.to_owned() + y));
}
}
Fragment::Dense(Op::Or, cross)
} else {
Fragment::Dense(
Op::And,
vec![Fragment::Dense(Op::Or, lhs), Fragment::Dense(Op::Or, rhs)],
)
}
}
and the function optimize::run should call flattern_or after the calling of inline
Next steps
Are there any next steps after we implement these changes?
Given the content of 'acgf', the regex query "[ab][cd][ef]" will recall this doc, and it's unexpected, since this regex query has a determined set of matched trigrams.
High level details
I added a unittest of
plan("[ab][cd][ef]")
, which is a query parse testcase in google's code search project(I know it's pretty old), which is expected to expand into "ace, acf, ... bdf", but the test output failed to inline the query stringsCurrent situation
Proposal
I've implemented some changes, and have passed the test(
plan("(a|b)(c|d)(e|f)(g|h)")
) added by my self.in
server/bleep/src/query/planner.rs
, TheFragment:::add
need to handle the following match arm:and the function
optimize::run
should callflattern_or
after the calling ofinline
Next steps Are there any next steps after we implement these changes?