getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.77k stars 4.16k forks source link

UnqualifiedQueryError: validation failed for entity generic_metrics_counters: Entity generic_metrics_counters: query col... #72094

Closed sentry-io[bot] closed 2 weeks ago

sentry-io[bot] commented 4 months ago

Prerequisite for this issue: https://github.com/getsentry/snuba/issues/6084

Sentry Issue: SENTRY-38JR

UnqualifiedQueryError: validation failed for entity generic_metrics_counters: Entity generic_metrics_counters: query columns (region, browser) do not exist
  File "sentry/sentry_metrics/querying/data/execution.py", line 692, in _bulk_run_query
    return bulk_run_query(requests)
  File "sentry/snuba/metrics_layer/query.py", line 101, in bulk_run_query
    snuba_results = bulk_snuba_queries(
  File "sentry/utils/snuba.py", line 890, in bulk_snuba_queries
    return _apply_cache_and_build_results(params, referrer=referrer, use_cache=use_cache)
  File "sentry/utils/snuba.py", line 953, in _apply_cache_and_build_results
    query_results = _bulk_snuba_query([item[1] for item in to_query], headers)
  File "sentry/utils/snuba.py", line 1040, in _bulk_snuba_query
    raise UnqualifiedQueryError(error["message"])

improve frontend validation or check the tags before querying on backend

vgrozdanic commented 3 months ago

This happens due to the race condition. Span connected to metric ingested, while for metric takes up to 3 minutes to be ingested. When users goes to metrics section - they can see spans connected to the metric and click on span to inspect more details. Problem happens when some new, never seen before tags, have been attached to the metric. Every tag key is, through indexer table, converted to number, but this happens only after the metric has been ingested. So the API will return that the tag exists (it is saved while processing span), but when the query tries to resolve that key using the indexer, it will raise an error since the key doesn't exists yet in the table.

This happens very rare because only in following scenario this can happen:

  1. user sends new, never seen before tags with metric
  2. in under 3 minutes (usual time for metric to be ingested) user visits metrics page and clicks on the one of the last sampled spans
  3. while loading the data about the span, FE tries to query for more details about the metrics containing yet to be processed tag <-- this triggers error, because the tag still doesn't exist in the indexer
vgrozdanic commented 3 months ago

UI doesn't break when an error happens. I suggest we wait for SnS team to add more descriptive messages, and then we can handle this better on the product side

Image

vgrozdanic commented 2 months ago

Also handle the case when there is a dashboard/widget that uses group by a tag that is later removed/ stop being sent. E.g. in the begging there is a tag foo:bar, and after the dashboard is created, and after some time passed, that tag is no longer sent. This will break that widget because query will fail with 500 error (it can't group by a tag that doesn't exist)