Open Jstein77 opened 1 day ago
As a follow up:
we do not have anything that tracks what type of metric is being used. We could see if it gets logged so that we can make that calculation on DD
Do we have any visibility into how often conversion metrics are being queried, and what % hit this error? I checked in our snowflake tables but didn't see anything. I'm trying to prioritize this bug.
Commented on the PR. I don't know if we can quick-fix this, but if you've got ideas I'd be happy to be wrong about that.
That error isn't happening in the pushdown optimizer, it's happening on plan conversion. A query of this form apparently doesn't convert to a sql query plan:
parsed_query = query_parser.parse_and_validate_query(
metric_names=("visit_buy_conversion_rate_7days",),
group_by_names=("metric_time", "user__home_state_latest"),
where_constraint=PydanticWhereFilter(where_sql_template="{{ Dimension('visit__referrer_id') }} = '123456'"),
)
Hmm, I made that change and it seems to break the predicate pushdown tests, but other tests seems to be fine
FAILED tests_metricflow/query_rendering/test_predicate_pushdown_rendering.py::test_conversion_metric_query_filters - RuntimeError: Expected exactly one matching instance for LinklessEntitySpec(element_name='user', entity_links=()) in instance set, but found: []. All entity instances: ()
@tomkit.lento mind lending a hand? 👀
This only addresses the bug from Courtney's post and also the issue 1210, but 1199 is a separate issue
Cause this base_measure_recipe.required_local_linkable_specs
contains the specs needed to do the joins to get the converted events, but it contains the specs from the where filter too since it was built via __get_required_and_extraneous_linkable_specs
and in the final aggregate measures node, we should just use the queried_specs as we don't want any of the specs from the filter to be there which is causing that bug
So trying to remember the context of all this again, but it seems like you can repro this by filtering with a dim that exists in the base measure's semantic model, but that dimension isn't in the group by? I'm looking at the error, it seems like that error is coming from when we cross join the agg'd base measure set
and the base measure set filtered by only converted
where the base measure set filtered by converted rows
seems to have the linkable element (ie., the dimension) so it's erroring out during the cross join rendering since it should not have the spec from the filter.
I did some testing, it seems like this fixes that bug? https://github.com/dbt-labs/metricflow/pull/1381
More details here:
@jordan.stein no, we would not.
Taking a look 👀
I thought if we filter on the base measure set, then join in the conversion set we should only be matching users in the base measure set so wouldn't we be safe from false conversion matches?
I can also repro if i run dbt sl query --metrics mql_to_seller_conversion_rate_base --where "{{Dimension('mql__origin')}} = 'direct_traffic'"
. In this case mql__origin
is from the same semantic model as the base measure
If you only apply to the filter to the base measure you may get inappropriate conversion matches.
I feel like that was intentional but maybe I'm misremembering? @willymwai.deng would know
I think we should apply the filter to the base measure, similar to how to do it for categorical dimensions. i.e The filter only gets applied here to limit the base measure set.
(
SELECT
metric_time__day
, SUM(mqls) AS mqls
FROM (
SELECT
DATE_TRUNC('day', first_contact_date) AS metric_time__day
, CASE WHEN mql_id IS NOT NULL THEN 1 ELSE 0 END AS mqls
FROM ANALYTICS.dbt_jstein.olist_marketing_qualified_leads olist_mqls_src_10000
) subq_2
WHERE metric_time__day > '2010-01-01'
GROUP BY
metric_time__day
) subq_4
Yep that's what I would expect
I get the error when filtering for metric time
Encountered an error error querying against the semantic layer: All join data sets should have the same set of linkable instances as the from dataset since all values are coalesced.
From dataset instance set: InstanceSet(measure_instances=(MeasureInstance(defined_from=(SemanticModelElementReference(semantic_model_name='olist_mqls', element_name='mqls'),), associated_columns=(ColumnAssociation(column_name='mqls', single_column_correlation_key=SingleColumnCorrelationKey(PYDANTIC_BUG_WORKAROUND=True)),), spec=MeasureSpec(element_name='mqls', non_additive_dimension_spec=None, fill_nulls_with=None), aggregation_state=AggregationState.COMPLETE),), dimension_instances=(), time_dimension_instances=(), entity_instances=(), group_by_metric_instances=(), metric_instances=(), metadata_instances=())
Join dataset instance sets: [InstanceSet(measure_instances=(MeasureInstance(defined_from=(SemanticModelElementReference(semantic_model_name='olist_closed_deals', element_name='sellers'),), associated_columns=(ColumnAssociation(column_name='sellers', single_column_correlation_key=SingleColumnCorrelationKey(PYDANTIC_BUG_WORKAROUND=True)),), spec=MeasureSpec(element_name='sellers', non_additive_dimension_spec=None, fill_nulls_with=None), aggregation_state=AggregationState.COMPLETE),), dimension_instances=(), time_dimension_instances=(TimeDimensionInstance(defined_from=(SemanticModelElementReference(semantic_model_name='olist_mqls', element_name='ds'),), associated_columns=(ColumnAssociation(column_name='metric_time__day', single_column_correlation_key=SingleColumnCorrelationKey(PYDANTIC_BUG_WORKAROUND=True)),), spec=TimeDimensionSpec(element_name='metric_time', entity_links=(), time_granularity=TimeGranularity.DAY, date_part=None, aggregation_state=None)),), entity_instances=(), group_by_metric_instances=(), metric_instances=(), metadata_instances=())]
Polling for queried result set
(cloud-env) jordanstein@Jordan-Stein jaffle-sl-template % dbt sl query --metrics mql_to_seller_conversion_rate_base --where "{{TimeDimension('metric_time')}} > '2010-01-01'"
Actually, wait, I’m thinking of something else. It is a bit odd that it’s only time filters.
But the linked GH issue has a repro case in it, and that uses a metric_time filter as well.
It should only happen with metric_time, for Reasons
@courtney.holcomb said:
From SyncLinear.com | SL-2777