Closed tlento closed 4 days ago
in progress
@tomkit.lento is this one still in progress?
Predicate pushdown update - tl,dr; 1 week spike on post-plan-building optimizer approach, followed by a decision on whether to take another
select sum(bookings) as instant_bookings
from (
select bookings, bookings__is_instant
from (
select 1 as bookings, is_instant as bookings__is_instant
from bookings
) a
where bookings__is_instant
) b
where bookings__is_instant
There's an easy hack we can put in place to disable it for the most obvious scenarios (mf query --metrics instant_bookings
), which is to skip pushdown for queries sourced out of a single semantic model, but that won't cover slightly more complex but still fairly obvious cases (mf query -metrics instant_bookings,listings
).
A complete solution, which involves moving the predicates instead of replicating them, can be done via a DataflowPlanOptimizer. Doing this requires the following:
After talking to @Jordan we decided to do the following:
We've decided to spike on this because:
If you have questions or concerns, fire away!
Follow-ups still pending:
visits
metric by user__country
.
Given a list of predicate filters, push each one down to its source semantic model if it is safe to do so.
At this time, we will support pushdown for predicates which are strictly limited to metric_time OR have all linkable specs (dimensions/entities/etc. in the expression) originating from the same semantic model.
For example (raw names used for readability):
booking__is_instant OR listing__is_lux
booking__is_instant AND (listing__is_lux OR listing__is_active)
["booking__is_instant", "listing__is_lux OR listing__is_active"]
booking__is_instant OR booking AND metric_time >= '2021-01-1' AND metric_time <= '2021-01-31'
(metric_time >= '2021-01-01' AND metric_time <= '2021-01-31') OR metric_time BETWEEN '2022-01-01' AND '2022-01-31'
The key is the input referenced in the predicates must all map onto elements in the same underlying semantic models. This may need to be restricted to dimensions only at first, as entities can be more complicated to evaluate without ambiguity, but ideally things will align in a fairly straightforward manner.
Some notes on implementation:
listing IS NOT NULL
and we'll happily give them NULL listing outputs in their group by - but not all of them. For this reason, we should re-apply the pushed down filters post-join in the initial implementation, and look to optimize out any pure redundancies in a follow up.From SyncLinear.com | SL-1628