Open Jstein77 opened 6 days ago
Adding more context to the thread. I created a saved query in jaffle-sl-template/timespine-debug.
The saved query config is below. You can run the following query to see this behavior dbt sl query --saved-query order_cost_time_spine_test
- name: order_cost_time_spine_test
query_params:
metrics:
- revenue
group_by:
- TimeDimension('metric_time', 'day')
where:
# - "{{TimeDimension('metric_time', 'day')}} >=
# CASE
# WHEN EXTRACT(day FROM CURRENT_DATE AT TIME ZONE 'America/New_York') > 14 THEN
# DATEADD(month, 1, DATE_TRUNC(month, CURRENT_DATE AT TIME ZONE 'America/New_York'))
# WHEN EXTRACT(day FROM CURRENT_DATE AT TIME ZONE 'America/New_York') <= 14 THEN
# DATE_TRUNC(month, CURRENT_DATE AT TIME ZONE 'America/New_York')
# ELSE NULL
# END"
# - "{{TimeDimension('order_item__ordered_at_test', 'month')}}>= '2020-01-01'"
- "{{TimeDimension('metric_time', 'month')}} = '2020-01-01'"
Let's see what we find when Diego is back and then make a call, but we might be able to at least get this to a point where they can work around it within a week or two.
Oh wait, this doesn't require predicate pushdown (as long as the metric time grains match between filter and group by). We just have to fully support the list interface, which is separate but related to predicate pushdown. If predicate pushdown worked I'd push that up next but I'm a little reluctant to make the change first and then try to debug.
That one is actually the problem - their filter is singular even though they've split it up.
With a more complete predicate pushdown implementation we could push it down, then do the filter + join, then re-apply the other filters (although it's difficult). But predicate pushdown apparently doesn't work.
It also doesn't correctly filter the time spine when I have a filter on another time dimension other than metric time at a monthly grain, and a filter on metric time at a monthly grain. For example this saved query yields the following SQL
I would expect the metric time filter to still be applied to the outer subquery, and the other time filter to be in the inner subquery unless it's included in the group by
Yea agreed that that one doesn't make sense if it's not in the grouping set.
If we focus on this as the filter:
{{TimeDimension('points_ledger_transaction_id__rent_cycle_month', 'month')}} >= whatever
I don't know what they expect out of a filter like this + a time spine fill nulls join operation, which is where the extra rows (all of which have NULL for that time dimension value) are coming from. Essentially, applying that filter after a time spine join might undo the fill nulls they've requested. If they group by the column, we know they expect us to discard rows that "miss" the join to the filtered set.
It also doesn't correctly filter the time spine when I have a filter on another time dimension other than metric time at a monthly grain, and a filter on metric time at a monthly grain. For example this saved query yields the following SQL
What is correct here?
The issue Ben is running into is basically unresolvable - he's adding a query-time filter on a dimension that isn't in the grouping set - and it's not clear what the user's intent is in the first place with the fill nulls with and time spine join setting.
I'm actually not sure why the second query didn't filter the time spine correctly 🤔 since it looks like they are querying everything at a monthly grain.
If you look at Ben's query he's actually using the default grain for metric time, its the other dimension points_ledger_transaction_id__rent_cycle_month
that's causing the issue here. He won't be able to query that at a daily grain since the minimum grain is month. If we fix filtering when metric time and another time dimension at a diffrent grain are included in the query it should resolve that issue for him.
- name: rent_points_earned_daily_cumulative_in_rent_cycle
description: "Amount of points earned on Rent by day, cumulative"
label: Cumulative Rent Points Earned by Day Within Current Rent Cycle
query_params:
metrics:
- cumulative_earned_point_amount
group_by:
- TimeDimension('metric_time')
where:
- "{{TimeDimension('points_ledger_transaction_id__rent_cycle_month', 'month')}} >=
case
when extract(day from current_date('America/New_York')) > 14 then date_add(date_trunc(current_date('America/New_York'), month), interval 1 month)
when extract(day from current_date('America/New_York')) <= 14 then date_trunc(current_date('America/New_York'), month)
else null
end"
- "{{TimeDimension('metric_time')}}<= current_date('America/New_York')"
- "{{TimeDimension('metric_time')}}<=
case
when extract(day from current_date('America/New_York')) > 14 then date_add(date_trunc(current_date('America/New_York'), month), interval 15 day)
when extract(day from current_date('America/New_York')) <= 14 then date_add(date_add(date_trunc(current_date('America/New_York'), month), interval -1 month), interval 15 day)
else null
end"
I feel like we should fix this, it feels a bit broken to not be able to filter the output of a cumulative metric or join to timespine metric by month or another grain that's not included in your query.
It also doesn't correctly filter the time spine when I have a filter on another time dimension other than metric time at a monthly grain, and a filter on metric time at a monthly grain. For example this saved query yields the following SQL
query_params:
metrics:
- revenue
group_by:
- TimeDimension('metric_time', 'day')
where:
# - "{{TimeDimension('metric_time', 'day')}} >=
# CASE
# WHEN EXTRACT(day FROM CURRENT_DATE AT TIME ZONE 'America/New_York') > 14 THEN
# DATEADD(month, 1, DATE_TRUNC(month, CURRENT_DATE AT TIME ZONE 'America/New_York'))
# WHEN EXTRACT(day FROM CURRENT_DATE AT TIME ZONE 'America/New_York') <= 14 THEN
# DATE_TRUNC(month, CURRENT_DATE AT TIME ZONE 'America/New_York')
# ELSE NULL
# END"
- "{{TimeDimension('order_item__ordered_at_test', 'month')}}>= '2020-01-01'"
- "{{TimeDimension('metric_time', 'day')}}>= '2020-01-01'"
Compiled SQL
SELECT
metric_time__day
, COALESCE(revenue, 0) AS revenue
FROM (
SELECT
subq_7.date_day AS metric_time__day
, subq_5.revenue AS revenue
FROM ANALYTICS.dbt_jstein.metricflow_time_spine subq_7
LEFT OUTER JOIN (
SELECT
metric_time__day
, SUM(revenue) AS revenue
FROM (
SELECT
DATE_TRUNC('month', cast(ordered_at as DATETIME)) AS order_item__ordered_at_test__month
, DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS metric_time__day
, product_price AS revenue
FROM ANALYTICS.dbt_jstein.order_items order_item_src_10000
) subq_2
WHERE ( order_item__ordered_at_test__month>= '2020-01-01' ) AND ( metric_time__day>= '2020-01-01' )
GROUP BY
metric_time__day
) subq_5
ON
subq_7.date_day = subq_5.metric_time__day
) subq_8
LIMIT 100
We can do anything, it's a question of whether or not it's worth the effort.
Yeah actually in looking at this further I don't think we can just update the conditional. We'd have to sub-query the filter around the coarser grain and special case the dataflow plan layout around these time granularities. That's kind of nasty. It can be done, but it's super annoying.
The other thing they can do is not use metric_time__month in a filter when grouping by metric_time__day. It's a little more verbose in the filter expression but that seems fine.
Or it would be fine if they didn't have to do {{ TimeDimension('metric_time') }}
every time they want to refer to metric_time.
Could we also pull through the monthly grain in the outer query so we're able to add the filter?
SELECT
metric_time__day
, COALESCE(revenue, 0) AS revenue
FROM (
SELECT
subq_7.date_day AS metric_time__day
, subq_5.revenue AS revenue
FROM ANALYTICS.dbt_jstein.metricflow_time_spine subq_7
LEFT OUTER JOIN (
SELECT
metric_time__day
, metric_time__month
, SUM(revenue) AS revenue
FROM (
SELECT
DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS metric_time__day
, DATE_TRUNC('month', cast(ordered_at as DATETIME)) AS metric_time__month
, product_price AS revenue
FROM ANALYTICS.dbt_jstein.order_items order_item_src_10000
) subq_2
GROUP BY
metric_time__day,
metric_time__month
) subq_5
ON
subq_7.date_day = subq_5.metric_time__day
WHERE metric_time__month>= '2020-01-01'
) subq_8
LIMIT 100
It makes sense when you look at the compiled SQL, because we don't select metric_time__month in the out query, so the only place to apply the grain is the inner query.
SELECT
metric_time__day
, COALESCE(revenue, 0) AS revenue
FROM (
SELECT
subq_7.date_day AS metric_time__day
, subq_5.revenue AS revenue
FROM ANALYTICS.dbt_jstein.metricflow_time_spine subq_7
LEFT OUTER JOIN (
SELECT
metric_time__day
, SUM(revenue) AS revenue
FROM (
SELECT
DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS metric_time__day
, DATE_TRUNC('month', cast(ordered_at as DATETIME)) AS metric_time__month
, product_price AS revenue
FROM ANALYTICS.dbt_jstein.order_items order_item_src_10000
) subq_2
WHERE metric_time__month>= '2020-01-01'
GROUP BY
metric_time__day
) subq_5
ON
subq_7.date_day = subq_5.metric_time__day
) subq_8
LIMIT 100
LIMIT 100
Just edited it 😏. It the first example he sent they are not, they are group by metric time for a cumulative metric, and filtering by a monthly metric.
Uh, those are the same queries. But the issue makes sense to me now. The work-around for now is to also group by the metric_time at the same grain as in the query filter. We can refine the logic for whether or not to re-apply the filter.
Wait. Are they grouping by the same grain?
This saved query will not apply the filter to the time spine table
- name: order_cost_time_spine_test
query_params:
metrics:
- revenue
group_by:
- TimeDimension('metric_time', 'day')
where:
- "{{TimeDimension('metric_time', 'month')}}>= '2020-01-01'"
This saved query will
- name: order_cost_time_spine_test
query_params:
metrics:
- revenue
group_by:
- TimeDimension('metric_time', 'day')
where:
- "{{TimeDimension('metric_time', 'day')}}>= '2020-01-01'"
2 bugs here:
🟡 Time filters are not being applied to the time spine table when join to timespine = true. cc @courtney.holcomb @tomkit.lento https://dbt-labs.slack.com/archives/C06UEFM66P5/p17196079781798191.
[June 28th, 2024 1:52 PM] ben.kramer: Friday Afternoon Question - Not Urgent! If i want to produce a saved query grouped by
metric_time
, is there a way to limit the results to a subset of days whenjoin_to_time_spine is true
?I added filters to my where clause but those filters apply within the subquery (see thread)
From SyncLinear.com | SL-2585