cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
https://cortexmetrics.io/
Apache License 2.0
5.47k stars 796 forks source link

Cortex querier seems not to expose querier_blocks_last_successful_*_timestamp_seconds metrics #4735

Closed shybbko closed 2 years ago

shybbko commented 2 years ago

Describe the bug In https://github.com/cortexproject/cortex/pull/2573 two metrics were introduced:

My understanding is that at least one of them should always be exposed for scraping. The problem - I see none of them exposed.

Upon looking at the code it also seems that one of the metrics (cortex_querier_blocks_last_successful_sync_timestamp_seconds) is referenced only in changelog and seems to be found nowhere in the actual code.

I don't find the issue extremely critical, but stumbled upon this and was puzzled by this, hence this report.

To Reproduce

  1. Deploy Cortex
  2. kubectl port-forward cortex-querier-558778f955-44f6k 1234:8080
  3. curl -s 127.0.0.1:1234/metrics | grep "blocks_last"
  4. the previous commands outputs nothing

Expected behavior At least one of the two metrics is available on the /metrics endpoint

Environment: GKE 1.21 Cortex 1.11.0 deployed via Cortex Helm Chart 1.4.0

Storage Engine Blocks

Additional Context In my case store-gateway is enabled:

store_gateway:
  enabled: true
friedrich-at-adobe commented 2 years ago

That metric appears if you disable the bucket index in the querier:

-blocks-storage.bucket-store.bucket-index.enabled=false
# HELP cortex_querier_blocks_consistency_checks_total Total number of consistency checks run on queried blocks.
# TYPE cortex_querier_blocks_consistency_checks_total counter
cortex_querier_blocks_consistency_checks_total 0
# HELP cortex_querier_blocks_last_successful_scan_timestamp_seconds Unix timestamp of the last successful blocks scan.
# TYPE cortex_querier_blocks_last_successful_scan_timestamp_seconds gauge
cortex_querier_blocks_last_successful_scan_timestamp_seconds 1.6521937595277874e+09

And you should probably never do that, disabling the bucket-index is not recommended.

shybbko commented 2 years ago

I indeed have bucket index enabled. So in this scenario the following alert (https://monitoring.mixins.dev/cortex/#alerts) does not make much sense:

alert: CortexQuerierHasNotScanTheBucket
annotations:
  message: |
    Cortex Querier {{ $labels.instance }} in {{ $labels.cluster }}/{{ $labels.namespace }} has not successfully scanned the bucket since {{ $value | humanizeDuration }}.
expr: |
  (time() - cortex_querier_blocks_last_successful_scan_timestamp_seconds > 60 * 30)
  and
  cortex_querier_blocks_last_successful_scan_timestamp_seconds > 0
for: 5m
labels:
  severity: critical

Is there any equivalent that would be recommended?

friedrich-at-adobe commented 2 years ago

If the bucket index is enabled, there is another alert for it

https://github.com/cortexproject/cortex-jsonnet/blob/978fe497e328c78a41fc710410cd479dd38abb5d/cortex-mixin/alerts/blocks.libsonnet#L238-L252

shybbko commented 2 years ago

This I already have. So I'm all good it seems. Thank you for the explanation!