cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
https://cortexmetrics.io/
Apache License 2.0
5.45k stars 791 forks source link

Ruler with S3 store emits warnings about bucket-index.json.gz #4278

Closed andersosthus closed 3 years ago

andersosthus commented 3 years ago

Describe the bug Setup:

Every minute I see the following in the Ruler log: level=warn ts=2021-06-14T09:37:41.691305994Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"

Looking in the S3 bucket, I see that under the prefix rules/, there is a bucket-index.json.gz file that I suspect the ruler tries to load.

I haven't seen any recommendations about separating blocks and rules into separate S3 buckets, so that's why I've kept it all in one.

Not sure if this is "as intended", or a side-effect of the blocks-index feature not filtering out the rules/ prefix, or the fact that the Ruler tries to load the bucket-index.json.gz file even though it is not in an Org prefix.

I've attached a screenshot of the S3 rules/ prefix below.

Not sure if this is only present in 1.9.0, or also in earlier versions. We were running 1.6.0, so I upgraded through to 1.9.0 during this weekend, and didn't notice it until now.

Environment:

Storage Engine

Additional Context Ruler config:

ruler:
  evaluation_interval: 1m
  enable_api: true
  ring:
    kvstore:
      prefix: "rulers/"
      store: consul
      consul:
        host: consul.kube-cortex.svc.cluster.local:8500
  ruler_client:
    grpc_compression: snappy
ruler_storage:
  backend: s3
  s3:
    endpoint: s3.dualstack.eu-north-1.amazonaws.com
    bucket_name: BUCKETNAME

Debug logs from Ruler during startup:

level=info ts=2021-06-14T10:21:15.707020837Z caller=main.go:188 msg="Starting Cortex" version="(version=1.9.0, branch=HEAD, revision=ed4f339)"
level=info ts=2021-06-14T10:21:15.709640295Z caller=server.go:239 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=debug ts=2021-06-14T10:21:15.716157043Z caller=api.go:128 msg="api: registering route" methods=GET path=/config auth=false
level=debug ts=2021-06-14T10:21:15.717297752Z caller=api.go:128 msg="api: registering route" methods=GET path=/ auth=false
level=debug ts=2021-06-14T10:21:15.717326909Z caller=api.go:128 msg="api: registering route" methods=GET path=/debug/fgprof auth=false
level=debug ts=2021-06-14T10:21:15.717407063Z caller=api.go:128 msg="api: registering route" methods=GET path=/runtime_config auth=false
level=debug ts=2021-06-14T10:21:15.717451406Z caller=api.go:128 msg="api: registering route" methods=GET path=/memberlist auth=false
level=debug ts=2021-06-14T10:21:15.717660059Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ingester/ring auth=false
level=debug ts=2021-06-14T10:21:15.717691458Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ring auth=false
level=info ts=2021-06-14T10:21:15.730143131Z caller=memcached.go:47 msg="created memcached cache"
level=info ts=2021-06-14T10:21:15.735093339Z caller=memcached.go:47 msg="created memcached cache"
level=info ts=2021-06-14T10:21:15.738953582Z caller=mapper.go:46 msg="cleaning up mapped rules directory" path=/rules
level=error ts=2021-06-14T10:21:15.739063897Z caller=mapper.go:50 msg="unable to read rules directory" path=/rules err="open /rules: no such file or directory"
level=debug ts=2021-06-14T10:21:15.739326702Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler/ring auth=false
level=debug ts=2021-06-14T10:21:15.739781331Z caller=api.go:128 msg="api: registering route" methods=POST path=/ruler/delete_tenant_config auth=true
level=debug ts=2021-06-14T10:21:15.739828456Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler_ring auth=false
level=debug ts=2021-06-14T10:21:15.739854355Z caller=api.go:128 msg="api: registering route" methods=GET path=/ruler/rule_groups auth=false
level=debug ts=2021-06-14T10:21:15.739902406Z caller=api.go:128 msg="api: registering route" methods=GET path=/prometheus/api/v1/rules auth=true
level=debug ts=2021-06-14T10:21:15.739943632Z caller=api.go:128 msg="api: registering route" methods=GET path=/prometheus/api/v1/alerts auth=true
level=debug ts=2021-06-14T10:21:15.739974758Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/v1/rules auth=true
level=debug ts=2021-06-14T10:21:15.740005369Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/v1/rules/{namespace} auth=true
level=debug ts=2021-06-14T10:21:15.740066505Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/v1/rules/{namespace}/{groupName} auth=true
level=debug ts=2021-06-14T10:21:15.740138232Z caller=api.go:128 msg="api: registering route" methods=POST path=/api/v1/rules/{namespace} auth=true
level=debug ts=2021-06-14T10:21:15.740181978Z caller=api.go:128 msg="api: registering route" methods=DELETE path=/api/v1/rules/{namespace}/{groupName} auth=true
level=debug ts=2021-06-14T10:21:15.740237548Z caller=api.go:128 msg="api: registering route" methods=DELETE path=/api/v1/rules/{namespace} auth=true
level=debug ts=2021-06-14T10:21:15.74028159Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/prom/api/v1/rules auth=true
level=debug ts=2021-06-14T10:21:15.740319784Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/prom/api/v1/alerts auth=true
level=debug ts=2021-06-14T10:21:15.74035689Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/prom/rules auth=true
level=debug ts=2021-06-14T10:21:15.740393122Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/prom/rules/{namespace} auth=true
level=debug ts=2021-06-14T10:21:15.740433614Z caller=api.go:128 msg="api: registering route" methods=GET path=/api/prom/rules/{namespace}/{groupName} auth=true
level=debug ts=2021-06-14T10:21:15.740513704Z caller=api.go:128 msg="api: registering route" methods=POST path=/api/prom/rules/{namespace} auth=true
level=debug ts=2021-06-14T10:21:15.740581802Z caller=api.go:128 msg="api: registering route" methods=DELETE path=/api/prom/rules/{namespace}/{groupName} auth=true
level=debug ts=2021-06-14T10:21:15.740640126Z caller=api.go:128 msg="api: registering route" methods=DELETE path=/api/prom/rules/{namespace} auth=true
level=debug ts=2021-06-14T10:21:15.740728976Z caller=api.go:128 msg="api: registering route" methods=GET path=/services auth=false
level=debug ts=2021-06-14T10:21:15.740862767Z caller=module_service.go:49 msg="module waiting for initialization" module=store-queryable waiting_for=memberlist-kv
level=debug ts=2021-06-14T10:21:15.740905826Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=memberlist-kv
level=debug ts=2021-06-14T10:21:15.740909247Z caller=module_service.go:49 msg="module waiting for initialization" module=runtime-config waiting_for=server
level=debug ts=2021-06-14T10:21:15.740924964Z caller=module_service.go:49 msg="module waiting for initialization" module=memberlist-kv waiting_for=server
level=info ts=2021-06-14T10:21:15.740965859Z caller=module_service.go:59 msg=initialising module=server
level=debug ts=2021-06-14T10:21:15.740909147Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=memberlist-kv
level=info ts=2021-06-14T10:21:15.741064469Z caller=module_service.go:59 msg=initialising module=runtime-config
level=info ts=2021-06-14T10:21:15.741047501Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=debug ts=2021-06-14T10:21:15.741118767Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=distributor-service
level=debug ts=2021-06-14T10:21:15.741242338Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=runtime-config
level=debug ts=2021-06-14T10:21:15.741257257Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=ring
level=debug ts=2021-06-14T10:21:15.741268966Z caller=module_service.go:49 msg="module waiting for initialization" module=store-queryable waiting_for=runtime-config
level=debug ts=2021-06-14T10:21:15.741284467Z caller=module_service.go:49 msg="module waiting for initialization" module=store-queryable waiting_for=server
level=info ts=2021-06-14T10:21:15.741294454Z caller=module_service.go:59 msg=initialising module=store-queryable
level=debug ts=2021-06-14T10:21:15.741283889Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=server
level=info ts=2021-06-14T10:21:15.741328634Z caller=module_service.go:59 msg=initialising module=ring
level=debug ts=2021-06-14T10:21:15.765646658Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=runtime-config
level=debug ts=2021-06-14T10:21:15.765708255Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=server
level=info ts=2021-06-14T10:21:15.765717178Z caller=module_service.go:59 msg=initialising module=distributor-service
level=debug ts=2021-06-14T10:21:15.765847357Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=memberlist-kv
level=debug ts=2021-06-14T10:21:15.765866439Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=ring
level=debug ts=2021-06-14T10:21:15.765872384Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=runtime-config
level=debug ts=2021-06-14T10:21:15.765879196Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=server
level=debug ts=2021-06-14T10:21:15.76588495Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=store-queryable
level=info ts=2021-06-14T10:21:15.765895287Z caller=module_service.go:59 msg=initialising module=ruler
level=info ts=2021-06-14T10:21:15.766023847Z caller=ruler.go:438 msg="ruler up and running"
level=debug ts=2021-06-14T10:21:15.766044721Z caller=ruler.go:476 msg="syncing rules" reason=initial
level=info ts=2021-06-14T10:21:15.766067097Z caller=cortex.go:414 msg="Cortex started"
level=warn ts=2021-06-14T10:21:16.394104917Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=info ts=2021-06-14T10:21:16.480389922Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/cortex-recieved-samples
level=info ts=2021-06-14T10:21:16.481573393Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/cortex-storage
level=info ts=2021-06-14T10:21:16.481813704Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/devops
level=info ts=2021-06-14T10:21:16.482026877Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/ingress
level=info ts=2021-06-14T10:21:16.48222914Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/prodder
level=info ts=2021-06-14T10:21:16.482647035Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/cortex-api
level=info ts=2021-06-14T10:21:16.483034374Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/cortex-cache
level=info ts=2021-06-14T10:21:16.483559811Z caller=mapper.go:154 msg="updating rule file" file=/rules/fake/cortex-queries
level=debug ts=2021-06-14T10:21:16.483684809Z caller=manager.go:136 msg="updating rules" user=fake
level=debug ts=2021-06-14T10:21:16.483714667Z caller=manager.go:139 msg="creating rule manager for user" user=fake
ts=2021-06-14T10:21:16.788436208Z caller=spanlogger.go:79 org_id=fake traceID=687d902e188d0b15 method=querier.Select level=debug start="2021-06-14 10:20:16.784 +0000 UTC" end="2021-06-14 10:21:16.784 +0000 UTC" step=0 matchers="unsupported value type"
level=info ts=2021-06-14T10:21:16.793144963Z caller=loader.go:133 msg="loaded bucket index" user=fake duration=3.008808ms

Logs during normal Ruler operation:

level=warn ts=2021-06-14T07:46:41.698931706Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:47:41.68316101Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:48:41.68443911Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:49:41.683036682Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:50:41.676662665Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:51:41.688776391Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:52:41.688353709Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:53:41.674962999Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:54:41.695299287Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:55:41.680111953Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:56:41.680350619Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:57:41.666877557Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:58:41.679107131Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T07:59:41.688248874Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:00:42.014617075Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:01:41.676791142Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:02:41.67915743Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:03:41.691319273Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:04:41.677637771Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:05:41.682876918Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:06:41.680299093Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"
level=warn ts=2021-06-14T08:07:41.686580593Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=bucket-index.json.gz err="invalid rule group object key"

S3 rules/ prefix: image

pracucci commented 3 years ago

👋 @andersosthus. Please do not use the same bucket for blocks storage and ruler/alertmanager config. We recently updated the blocks storage architecture doc to mention it.

andersosthus commented 3 years ago

Thanks, didn't notice that when upgrading :)

pracucci commented 3 years ago

Thanks, didn't notice that when upgrading :)

Our fault for not making it prominent enough. Sorry for that!

mvkrishna86 commented 3 years ago

Hi,

May I know what needs to be present in the fake directory that you have created. When I upload the Alerting rules in the S3, the ruler is keep on complaining 'invalid rule group object key'. Can you please let me know what is the format of the alert group file that we need to use?

level=warn ts=2021-08-04T17:16:03.112097009Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=operations/ err="invalid rule group object key"
level=warn ts=2021-08-04T17:16:03.112201241Z caller=bucket_client.go:110 msg="invalid rule group object key found while listing rule groups" key=operations/rules-operations.yaml err="invalid rule group object key"