cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
https://cortexmetrics.io/
Apache License 2.0
5.46k stars 794 forks source link

Cortex can read rules but doesn't activate them #3401

Closed jakubgs closed 2 years ago

jakubgs commented 3 years ago

Description

I'm running 1.4.0 using the binary from GitHub and I have ruler configured to send alerts to my own cluster of Alertmanager.

For a moment I saw the alerts in my Alertmanager Web UI, but shortly after they disappeared.

Config

My ruler section of the config looks like this:

ruler:
  external_url: 'https://alerts.example.org/'
  alertmanager_url: 'http://localhost:9093/'
  enable_alertmanager_v2: true
  rule_path: '/var/tmp/cortex/rules'
  enable_api: true
  storage:
    type: local
    local:
      directory: '/etc/cortex/rules'

My rules are located in /etc/cortex/rules/fale since I use auth_enabled: false.

Debugging

I can see the rules are located in the right place because I can look them up using the /api/v1/rules call:

 > curl -s 'http://localhost:9092/api/v1/rules' | head
instance.yml:
    - name: instance
      rules:
        - alert: InstanceDown
          expr: up == 0
          for: 5m
          annotations:
            current_value: '{{ $value }}'
            description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
            summary: Instance {{ $labels.instance }} down

But, when I try to use the /prometheus/api/v1/rules path I get nothing:

 > curl -s 'http://localhost:9092/prometheus/api/v1/rules' -H 'X-Scope-OrgID: fake' | jq .
{
  "status": "success",
  "data": {
    "groups": []
  },
  "errorType": "",
  "error": ""
}

Even though just minutes ago I saw the rules displayed here. As well as the alerts generated by the rules. But now there's nothing there:

 > curl -s 'http://localhost:9092/prometheus/api/v1/alerts' -H 'X-Scope-OrgID: fake' | jq .
{
  "status": "success",
  "data": {
    "alerts": []
  },
  "errorType": "",
  "error": ""
}

I'm confused as to what caused them to disappear. Restarting Cortex nodes doesn't fix the issue.

Questions

jakubgs commented 3 years ago

I can see the rules files exist in both directories:

 > ls -l /etc/cortex/rules/fake 
total 12
-rw-r----- 1 cortex adm 1207 Oct 27 13:52 instance.yml
-rw-r----- 1 cortex adm 1347 Oct 27 13:52 network.yml
-rw-r----- 1 cortex adm  479 Oct 27 13:56 statusd.yml

 > ls -l /var/tmp/cortex/rules/fake 
total 12
-rwxr-xr-x 1 cortex daemon 1098 Oct 27 14:47 instance.yml
-rwxr-xr-x 1 cortex daemon 1241 Oct 27 14:47 network.yml
-rwxr-xr-x 1 cortex daemon  443 Oct 27 14:47 statusd.yml
jakubgs commented 3 years ago

And now they suddenly returned, even though the only thing I changed were ingestion limits:

 > curl -s 'http://localhost:9092/prometheus/api/v1/alerts' -H 'X-Scope-OrgID: fake' | jq . | head
{
  "status": "success",
  "data": {
    "alerts": [
      {
        "labels": {
          "alertname": "InstanceDown",
          "datacenter": "aws-eu-central-1a",
          "fleet": "consul.hq",
          "group": ",monitor,metrics-source,",

Kinda weird.

jakubgs commented 3 years ago

But I'm still not seeing it appear in my Alertmanager, even though I saw them before in it for a minute or two.

jakubgs commented 3 years ago

And now gone again, even though I did nothing, not even a restart:

 > curl -s 'http://localhost:9092/prometheus/api/v1/rules' -H 'X-Scope-OrgID: fake' | jq .
{
  "status": "success",
  "data": {
    "groups": []
  },
  "errorType": "",
  "error": ""
}

Not sure how I'm supposed to debug this.

gotjosh commented 3 years ago

Mainly for performance and fair-tenancy reasons, the ruler will not immediately start evaluating your rules and alerts. Instead, based on the poll interval it'll fetch the list of rules per tenant from storage and update/start the corresponding data structures.

https://github.com/cortexproject/cortex/blob/a6292c1179c659c994b62a0894c40584d010bdfa/pkg/ruler/ruler.go#L415-L432

The key concepts here are the Configuration API (api/v1/rules) which reflect what is currently in storage and then the Prometheus compatible endpoint (api/prom/v1/rules) which reflect what is actually being evaluated.

My understanding is that ruler.rule_path is the place where Cortex checks for rule files. Correct?

# file path to store temporary rule files for the prometheus rule managers
# CLI flag: -ruler.rule-path
[rule_path: <string> | default = "/rules"]

In the ruler, we run the vanilla Prometheus rules manager in a multitenant fashion. It is meant to work with files so we need to write the files from storage to disk to provision the manager - this is now configurable in upstream and will be going away soon.

My understanding is that ruler.storage.local.directory configures a temporary location for rule files. Correct?

local:
# Directory to scan for rules
# CLI flag: -ruler.storage.local.directory
[directory: <string> | default = ""]

If I understand your question correctly, I believe that's correct! The local storage implementation is read-only (it is only meant to be for testing purpose IIRC, but might serve a case where you don't care about multi-tenancy?) - you should place your rules there.

Why can the rules be loaded from ruler.rule_path but are not available via /prometheus/api/v1/rules?

This is a good catch, at a quick glance I think this is a bug as rules should not be allowed to load from there - the directory is deleted after we finish loading them to the manager. But as I said above, this is going away soon. #3134

gotjosh commented 3 years ago

If you're using Prometheus to push your metrics and don't care about multitenancy - I would consider doing the rule evaluation locally in Prometheus.

Doing that is actually a more robust setup as you don't need to wait for your samples to be pushed to Cortex for the rule evaluation - it should shave at least a good minute off (as we have an evaluation delay configured by default of 1m - but can be changed) your alerts and rule evaluation.

gotjosh commented 3 years ago

@pracucci I think we categorise this issue as a doc improvement as everything I've said before can be summarised into a "getting started with rule evaluation in Cortex guide"

jakubgs commented 3 years ago

The local storage implementation is read-only (it is only meant to be for testing purpose IIRC, but might serve a case where you don't care about multi-tenancy?) - you should place your rules there.

If the place to put rules files is configured via ruler.storage.local.directory, then what is ruler.rule_path for? It seems backwards to me. It seems like the former should be service "storage" while the latter should be scanned for new rules.

This is a good catch, at a quick glance I think this is a bug as rules should not be allowed to load from there - the directory is deleted after we finish loading them to the manager. But as I said above, this is going away soon. #3134

So the fact that my rules were actually loaded from ruler.rule_path was a bug or unintended behavior for Cortex? If so, what is ruler.rule_path even for?

If you're using Prometheus to push your metrics and don't care about multitenancy - I would consider doing the rule evaluation locally in Prometheus. Doing that is actually a more robust setup as you don't need to wait for your samples to be pushed to Cortex for the rule evaluation.

Interesting point. I guess that would make the setup simpler, by removing more stuff from Cortex, which is notoriously difficult to configure and get working. And would get my alerts triggered faster because they would be closer to the source. Thanks for explaining.

I really think information like this should be visible in the docs. I bet most people don't care about multi-tenancy.

Thanks for explaining, this was very helpful, appreciate it.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

jakubgs commented 3 years ago

The docs need work in this area. So no.

pracucci commented 3 years ago

The docs need work in this area. So no.

@jakubgs Could you open a PR to improve the doc accordingly, please?

jargelo commented 3 years ago

Just to add to this, I've been spending the last two days trying to wrap my head around how to actually use the Ruler, Alertmanager and configsdb and how they are interacting with each other. My prometheus is firing alerts but I don't see any way in Cortex itself to pick these up. I also don't understand how you are supposed to use the Ruler. The API is experimental, so does that mean you're ending up with writing YAML files and putting them in a configmap / S3 bucket so the Ruler can pick them up?

If the latter is true then it would be nice if it could work with the prometheus kubernetes operator, so that you can use your CRDs to define a PrometheusRule and then the Ruler can pick them up. I guess this is currently not supported right now?

Lots of questions and more of a rambling than anything else, but as mentioned I'm trying to wrap my head around how it actually works. The documentation on this topic does need improvement :D

jargelo commented 3 years ago

For others who are also stumbling on this problem. I gave up with the Cortex alertmanager and reverted back to the prometheus operator alertmanager and that worked like I was thinking it would. To me the Cortex alertmanager does not compute. Which is a shame because I would have preferred to use the Cortex one as it logically makes more sense to do alerting on the cortex level.

jakubgs commented 3 years ago

I have also gave up on using Cortex for alerting and just do it through Prometheus. Unfortunate.

pracucci commented 3 years ago

My prometheus is firing alerts but I don't see any way in Cortex itself to pick these up.

Cortex doesn't pick you alerts fired by Promtheus. Cortex ruler evaluates your rules (recording rules and alerts) and send notifications to the configured alertmanager endpoint, the Cortex alertmanager. The Cortex alertmanager is a multi-tenant wrapper on top of the Prometheus alertmanager to make it multi-tenant (given Cortex is multi-tenant).

I also don't understand how you are supposed to use the Ruler. The API is experimental, so does that mean you're ending up with writing YAML files and putting them in a configmap / S3 bucket so the Ruler can pick them up?

You should use the Ruler API. I think we're ready to mark them stable, given we don't expect to make anymore breaking changes to them and at Grafana Labs we're running them in production since a while.

Once enabled, manage your rules via the API.

cloudcafetech commented 3 years ago

I have installed from k8s folder using manifest.

Trying to setup rules using configmap but not able to activate :(

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ruler-configmap
  namespace: monitoring
data:
  rules.yml: |-
    groups:
      - name: "centralmonitoring"
        rules:
          - alert: "PrometheusDown"
            annotations:
              message: Prometheus replica in cluster {{$labels.cluster}} has disappeared.
            expr: sum(up{cluster!="", instance=~"prometheus.*", job="kubernetes-service-endpoints"}) by (cluster) < 3
            for: 15s
            labels:
              severity: critical
          - alert: "TooManyPods"
            annotations:
              message: Too many pods in cluster {{$labels.cluster}} on node {{$labels.instance}}
            expr: sum by(cluster,instance) (kubelet_running_pods{cluster!="",instance!=""}) >15
            for: 15s
            labels:
              severity: warning
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruler
spec:
  replicas: 1
  selector:
    matchLabels:
      name: ruler
  template:
    metadata:
      labels:
        name: ruler
    spec:
      containers:
      - name: ruler
        image: quay.io/cortexproject/cortex:v1.9.0
        imagePullPolicy: IfNotPresent
        args:
        - -target=ruler
        - -log.level=debug
        - -server.http-listen-port=80
        - -ruler.configs.url=http://configs.monitoring.svc.cluster.local:80
        - -ruler.alertmanager-url=http://alertmanager.monitoring.svc.cluster.local/alertmanager/
        - -ruler-storage.backend=local
        - -ruler-storage.local.directory=/etc/cortex/rules
        - -consul.hostname=consul.monitoring.svc.cluster.local:8500
        - -s3.url=s3://admin:admin2675@172.31.37.67:9000/monitoring
        - -s3.force-path-style=true
        - -dynamodb.url=dynamodb://user:pass@dynamodb.monitoring.svc.cluster.local:8000
        - -schema-config-file=/etc/cortex/schema.yaml
        - -store.chunks-cache.memcached.hostname=memcached.monitoring.svc.cluster.local
        - -store.chunks-cache.memcached.timeout=100ms
        - -store.chunks-cache.memcached.service=memcached
        - -distributor.replication-factor=1
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /etc/cortex
          name: config
        - mountPath: /etc/cortex/rules
          name: alert
        - mountPath: /rules
          name: rules
      volumes:
        - configMap:
            name: schema-config
          name: config
        - configMap:
            name: cortex-ruler-configmap
          name: alert
        - emptyDir: {}
          name: rules

2nd question is how to view alert in alertmanger UI, its not similar like promethues alertmanger.?

pracucci commented 3 years ago

Trying to setup rules using configmap but not able to activate

How is the ruler configured? Do you get any error from the ruler?

2nd question is how to view alert in alertmanger UI, its not similar like promethues alertmanger.?

Cortex alertmanager exposes the Prometheus Alertmanager UI. See: https://cortexmetrics.io/docs/api/#alertmanager-ui

cloudcafetech commented 3 years ago

@pracucci

Ruler and rules (in configmap) is configured using below yaml.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ruler-configmap
  namespace: monitoring
data:
  rules.yml: |-
    groups:
      - name: "centralmonitoring"
        rules:
          - alert: "PrometheusDown"
            annotations:
              message: Prometheus replica in cluster {{$labels.cluster}} has disappeared.
            expr: sum(up{cluster!="", instance=~"prometheus.*", job="kubernetes-service-endpoints"}) by (cluster) < 3
            for: 15s
            labels:
              severity: critical
          - alert: "TooManyPods"
            annotations:
              message: Too many pods in cluster {{$labels.cluster}} on node {{$labels.instance}}
            expr: sum by(cluster,instance) (kubelet_running_pods{cluster!="",instance!=""}) >15
            for: 15s
            labels:
              severity: warning
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruler
spec:
  replicas: 1
  selector:
    matchLabels:
      name: ruler
  template:
    metadata:
      labels:
        name: ruler
    spec:
      containers:
      - name: ruler
        image: quay.io/cortexproject/cortex:v1.9.0
        imagePullPolicy: IfNotPresent
        args:
        - -target=ruler
        - -log.level=debug
        - -server.http-listen-port=80
        - -ruler.configs.url=http://configs.monitoring.svc.cluster.local:80
        - -ruler.alertmanager-url=http://alertmanager.monitoring.svc.cluster.local:9093
        - -ruler-storage.backend=local
        - -ruler-storage.local.directory=/etc/cortex/rules
        - -ruler.rule-path=/rules
        - -consul.hostname=consul.monitoring.svc.cluster.local:8500
        - -s3.url=s3://admin:admin2675@172.31.42.160:9000/monitoring
        - -s3.force-path-style=true
        - -dynamodb.url=dynamodb://user:pass@dynamodb.monitoring.svc.cluster.local:8000
        - -schema-config-file=/etc/cortex/schema.yaml
        - -store.chunks-cache.memcached.addresses=memcached.monitoring.svc.cluster.local:11211
        - -store.chunks-cache.memcached.timeout=100ms
        - -store.chunks-cache.memcached.service=memcached
        - -distributor.replication-factor=1
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /etc/cortex
          name: config
        - mountPath: /etc/cortex/rules
          name: alert
        - mountPath: /rules
          name: rules
      volumes:
        - configMap:
            name: schema-config
          name: config
        - configMap:
            name: cortex-ruler-configmap
          name: alert
        - emptyDir: {}
          name: rules

There is no such error ...

[root@ip-172-31-42-160 k8s-cortex]# oc logs -f ruler-59c6f8ff9f-h8ntx
level=info ts=2021-06-09T06:59:17.405782321Z caller=main.go:188 msg="Starting Cortex" version="(version=1.9.0, branch=HEAD, revision=ed4f339)"
level=info ts=2021-06-09T06:59:17.406432694Z caller=server.go:239 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=debug ts=2021-06-09T06:59:17.407438378Z caller=api.go:128 msg="api: registering route" methods=GET path=/config auth=false
level=debug ts=2021-06-09T06:59:17.40770487Z caller=api.go:128 msg="api: registering route" methods=GET path=/ auth=false
level=debug ts=2021-06-09T06:59:17.407830185Z caller=api.go:128 msg="api: registering route" methods=GET path=/debug/fgprof auth=false
level=warn ts=2021-06-09T06:59:17.408590285Z caller=experimental.go:19 msg="experimental feature in use" feature="DNS-based memcached service discovery"
level=debug ts=2021-06-09T06:59:17.422997936Z caller=api.go:128 msg="api: registering route" methods=GET path=/memberlist auth=false
level=debug ts=2021-06-09T06:59:17.423532235Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ingester/ring auth=false
level=debug ts=2021-06-09T06:59:17.42370209Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ring auth=false
level=info ts=2021-06-09T06:59:17.426289619Z caller=mapper.go:46 msg="cleaning up mapped rules directory" path=/rules
level=debug ts=2021-06-09T06:59:17.4264654Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler/ring auth=false
level=debug ts=2021-06-09T06:59:17.426504814Z caller=api.go:128 msg="api: registering route" methods=POST path=/ruler/delete_tenant_config auth=true
level=debug ts=2021-06-09T06:59:17.426537016Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler_ring auth=false
level=debug ts=2021-06-09T06:59:17.42655769Z caller=api.go:128 msg="api: registering route" methods=GET path=/ruler/rule_groups auth=false
level=debug ts=2021-06-09T06:59:17.426637122Z caller=api.go:128 msg="api: registering route" methods=GET path=/services auth=false
level=debug ts=2021-06-09T06:59:17.426723438Z caller=module_service.go:49 msg="module waiting for initialization" module=memberlist-kv waiting_for=server
level=debug ts=2021-06-09T06:59:17.427038426Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=memberlist-kv
level=debug ts=2021-06-09T06:59:17.427070422Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=memberlist-kv
level=debug ts=2021-06-09T06:59:17.427093165Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=distributor-service
level=info ts=2021-06-09T06:59:17.42713247Z caller=module_service.go:59 msg=initialising module=server
level=debug ts=2021-06-09T06:59:17.427257522Z caller=module_service.go:49 msg="module waiting for initialization" module=store waiting_for=server
level=info ts=2021-06-09T06:59:17.427456583Z caller=module_service.go:59 msg=initialising module=store
level=info ts=2021-06-09T06:59:17.427496322Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=debug ts=2021-06-09T06:59:17.427528904Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=ring
level=debug ts=2021-06-09T06:59:17.427540691Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=server
level=info ts=2021-06-09T06:59:17.428076425Z caller=module_service.go:59 msg=initialising module=ring
level=debug ts=2021-06-09T06:59:17.440946101Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=server
level=info ts=2021-06-09T06:59:17.440992229Z caller=module_service.go:59 msg=initialising module=distributor-service
level=debug ts=2021-06-09T06:59:17.441333774Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=memberlist-kv
level=debug ts=2021-06-09T06:59:17.441362157Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=ring
level=debug ts=2021-06-09T06:59:17.44137333Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=server
level=debug ts=2021-06-09T06:59:17.441399934Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=store
level=info ts=2021-06-09T06:59:17.441413994Z caller=module_service.go:59 msg=initialising module=ruler
level=info ts=2021-06-09T06:59:17.441469288Z caller=ruler.go:438 msg="ruler up and running"
level=debug ts=2021-06-09T06:59:17.441484706Z caller=ruler.go:476 msg="syncing rules" reason=initial
level=info ts=2021-06-09T06:59:17.441676022Z caller=cortex.go:414 msg="Cortex started"
level=debug ts=2021-06-09T07:00:17.442149029Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-09T07:01:17.441894902Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-09T07:02:17.442073341Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-09T07:03:17.441717173Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-09T07:04:17.441543041Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-09T07:05:17.441665966Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-09T07:06:17.442459556Z caller=ruler.go:476 msg="syncing rules" reason=periodic

Running PODs

[root@ip-172-31-42-160 k8s-cortex]# oc get po
NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-0                        1/1     Running   0          171m
configs-56869844c9-25657              1/1     Running   0          176m
configs-db-7577d48bd6-g55jl           1/1     Running   0          176m
consul-7645b66b9d-smw5b               1/1     Running   0          176m
distributor-7c757f5cb-qbzmk           1/1     Running   0          176m
dynamodb-5fb595f677-2xpcx             1/1     Running   0          176m
grafana-0                             1/1     Running   0          116m
ingester-d4fc9d4fc-pr9wx              1/1     Running   0          176m
kube-state-metrics-5fbbf95c46-w7jm5   2/2     Running   0          176m
memcached-59b9c6746-dzzfc             1/1     Running   0          176m
nginx-6445958699-g4ncf                1/1     Running   0          176m
node-exporter-5ttlb                   1/1     Running   0          176m
prometheus-0                          1/1     Running   0          176m
querier-ccc884bf9-ncg54               1/1     Running   0          176m
query-frontend-56fbd69cd-744t2        1/1     Running   0          176m
ruler-59c6f8ff9f-h8ntx                1/1     Running   0          8m29s
table-manager-64d489c45f-6qftm        1/1     Running   0          176m

Basically we have a centrally Prometheus Alertmanager setup and want to reuse same alertmanager. Dont want to use cortex saperate alertmanger.

For testing purpose, similar (rules in configmap) way I used setup in Thanos (using Thanos ruler) also, it nicely works.

But not able to fix Cortex ruler. :(

pracucci commented 3 years ago

With your current config, the rules file is store at: /etc/cortex/rules/rules.yml

The expected (correct) filepath is: /etc/cortex/rules/<tenant id>/<filename>.yml

If you're running with auth enabled, then <tenant id> is your tenant id, otherwise it's hardcoded to fake. So if you're running with auth disabled the file should be stored at the following path to make it work: /etc/cortex/rules/fake/rules.yml

cloudcafetech commented 3 years ago

@pracucci

In my S3(Minio) monitoring bucket I am getting 0, not sure is it the tenant id or not.

image

And I did modification in ruler yaml but NO LUCK.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ruler-configmap
  namespace: monitoring
data:
  rules.yml: |-
    groups:
      - name: "centralmonitoring"
        rules:
          - alert: "PrometheusDown"
            annotations:
              message: Prometheus replica in cluster {{$labels.cluster}} has disappeared.
            expr: sum(up{cluster!="", pod=~"prometheus.*"}) by (cluster) < 3
            for: 15s
            labels:
              severity: critical
              category: metrics
          - alert: "TooManyPods"
            annotations:
              message: Too many pods in cluster {{$labels.cluster}} on node {{$labels.instance}}
            expr: sum by(cluster,instance) (kubelet_running_pods{cluster!="",instance!=""}) > 5
            for: 15s
            labels:
              severity: warning
              category: metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruler
spec:
  replicas: 1
  selector:
    matchLabels:
      name: ruler
  template:
    metadata:
      labels:
        name: ruler
    spec:
      containers:
      - name: ruler
        image: quay.io/cortexproject/cortex:v1.9.0
        imagePullPolicy: IfNotPresent
        args:
        - -target=ruler
        - -log.level=debug
        - -server.http-listen-port=80
        - -ruler.configs.url=http://configs.monitoring.svc.cluster.local:80
        - -ruler.alertmanager-url=http://alertmanager.monitoring.svc.cluster.local:9093
        - -ruler-storage.backend=local
        - -ruler-storage.local.directory=/etc/cortex/rules/0
        - -ruler.rule-path=/rules
        - -consul.hostname=consul.monitoring.svc.cluster.local:8500
        - -s3.url=s3://admin:admin2675@172.31.40.72:9000/monitoring
        - -s3.force-path-style=true
        - -dynamodb.url=dynamodb://user:pass@dynamodb.monitoring.svc.cluster.local:8000
        - -schema-config-file=/etc/cortex/schema.yaml
        - -store.chunks-cache.memcached.addresses=memcached.monitoring.svc.cluster.local:11211
        - -store.chunks-cache.memcached.timeout=100ms
        - -store.chunks-cache.memcached.service=memcached
        - -distributor.replication-factor=1
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /etc/cortex
          name: config
        - mountPath: /etc/cortex/rules/0
          name: alert
        - mountPath: /rules
          name: rules
      volumes:
        - configMap:
            name: schema-config
          name: config
        - configMap:
            name: cortex-ruler-configmap
          name: alert
        - emptyDir: {}
          name: rules
[root@ip-172-31-40-72 monitoring]# oc exec -it ruler-7fb94dd7d7-8t6qc sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # ls -ltr /rules/
total 0
/ # ls -ltr /etc/cortex/rules/0/
total 0
lrwxrwxrwx    1 root     root            16 Jun 19 00:42 rules.yml -> ..data/rules.yml
/ # exit
[root@ip-172-31-40-72 monitoring]# oc logs -f ruler-7fb94dd7d7-8t6qc
level=info ts=2021-06-19T00:42:53.016332668Z caller=main.go:188 msg="Starting Cortex" version="(version=1.9.0, branch=HEAD, revision=ed4f339)"
level=info ts=2021-06-19T00:42:53.017559897Z caller=server.go:239 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=debug ts=2021-06-19T00:42:53.019175345Z caller=api.go:128 msg="api: registering route" methods=GET path=/config auth=false
level=debug ts=2021-06-19T00:42:53.021682252Z caller=api.go:128 msg="api: registering route" methods=GET path=/ auth=false
level=debug ts=2021-06-19T00:42:53.021814945Z caller=api.go:128 msg="api: registering route" methods=GET path=/debug/fgprof auth=false
level=debug ts=2021-06-19T00:42:53.021959288Z caller=api.go:128 msg="api: registering route" methods=GET path=/memberlist auth=false
level=debug ts=2021-06-19T00:42:53.023008694Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ingester/ring auth=false
level=debug ts=2021-06-19T00:42:53.023055416Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ring auth=false
level=warn ts=2021-06-19T00:42:53.023883349Z caller=experimental.go:19 msg="experimental feature in use" feature="DNS-based memcached service discovery"
level=info ts=2021-06-19T00:42:53.030612161Z caller=mapper.go:46 msg="cleaning up mapped rules directory" path=/rules
level=debug ts=2021-06-19T00:42:53.030753864Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler/ring auth=false
level=debug ts=2021-06-19T00:42:53.030790066Z caller=api.go:128 msg="api: registering route" methods=POST path=/ruler/delete_tenant_config auth=true
level=debug ts=2021-06-19T00:42:53.030835253Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler_ring auth=false
level=debug ts=2021-06-19T00:42:53.030860276Z caller=api.go:128 msg="api: registering route" methods=GET path=/ruler/rule_groups auth=false
level=debug ts=2021-06-19T00:42:53.030909584Z caller=api.go:128 msg="api: registering route" methods=GET path=/services auth=false
level=info ts=2021-06-19T00:42:53.031585317Z caller=module_service.go:59 msg=initialising module=server
level=debug ts=2021-06-19T00:42:53.031690112Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.031718025Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.031758011Z caller=module_service.go:49 msg="module waiting for initialization" module=store waiting_for=server
level=debug ts=2021-06-19T00:42:53.031776355Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=distributor-service
level=info ts=2021-06-19T00:42:53.031791076Z caller=module_service.go:59 msg=initialising module=store
level=debug ts=2021-06-19T00:42:53.031587773Z caller=module_service.go:49 msg="module waiting for initialization" module=memberlist-kv waiting_for=server
level=info ts=2021-06-19T00:42:53.031937836Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=debug ts=2021-06-19T00:42:53.032078255Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=ring
level=debug ts=2021-06-19T00:42:53.032146949Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=server
level=info ts=2021-06-19T00:42:53.032254539Z caller=module_service.go:59 msg=initialising module=ring
level=debug ts=2021-06-19T00:42:53.047187773Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=server
level=info ts=2021-06-19T00:42:53.047280661Z caller=module_service.go:59 msg=initialising module=distributor-service
level=debug ts=2021-06-19T00:42:53.047490102Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.047527989Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=ring
level=debug ts=2021-06-19T00:42:53.047539583Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=server
level=debug ts=2021-06-19T00:42:53.047727464Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=store
level=info ts=2021-06-19T00:42:53.047738286Z caller=module_service.go:59 msg=initialising module=ruler
level=info ts=2021-06-19T00:42:53.047768984Z caller=ruler.go:438 msg="ruler up and running"
level=debug ts=2021-06-19T00:42:53.047783448Z caller=ruler.go:476 msg="syncing rules" reason=initial
level=info ts=2021-06-19T00:42:53.047888139Z caller=cortex.go:414 msg="Cortex started"
level=debug ts=2021-06-19T00:43:53.048652595Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-19T00:44:53.047992537Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-19T00:45:53.048070935Z caller=ruler.go:476 msg="syncing rules" reason=periodic

NOTE: I tried with both 0 and fake but result same :(

pracucci commented 3 years ago

- -ruler-storage.local.directory=/etc/cortex/rules/0

This should be:

- -ruler-storage.local.directory=/etc/cortex/rules

Basically, -ruler-storage.local.directory should be set to the root directory where all tenants rules are stored. This root directory is expected to have the tenant ID as sub-directory (eg. /etc/cortex/rules/0).

cloudcafetech commented 3 years ago

@pracucci

Passing argument not working at all, I tried that also, you can see same configuration prior to my last post what you mentioned here.

Anyway same things is working if I mention in config.yaml but not working passing argument , not sure why.

Thanks for support.

pracucci commented 3 years ago

That's weird. One reason why you're seeing that weird behaviour is that you're mixing CLI flags -ruler.storage.* and -ruler-storage.* (I know, it's very easy to get it wrong). -ruler.storage.* are legacy and shouldn't be used anymore while -ruler-storage.* are the new ones and you should use it.

cloudcafetech commented 3 years ago

while -ruler-storage.* are the new ones and you should use it.

If it is the right one then everywhere I already used it. Looks my old comments.

aarepuu commented 3 years ago

I though I'll chip in as I was trying to make it work and I had similar issue to you @cloudcafetech. First thing I noticed that if you have any trace of configdb configuration there and you are trying to use -ruler-storage.* CLI flag it will not work (is discarded?). You should remove -ruler.configs.url flag from your ruler conf if using -ruler-storage.backend=local (or any other type).

Also if you are have auth_enabled=true you need to put your rules for each tenants into a subfolder, but specify -ruler-storage.local.directory as parent folder. So in your example the:

Then that would work for tenat_id=0

If you are running it with auth_enabled=false then like @pracucci said you should store the rules in /etc/cortex/rules/fake/ subfolder. Still leaving the parent folder as your -ruler-storage.local.directory though.

There is probably some docs improvements that can be done to make this more explicit. Hope that helps.

Rahuly360 commented 3 years ago

Is anyone found any solution for rules status: Inactive, I am getting my rules status Inactive while checking by API "/api/prom/api/v1/rules". Please let me know if anybody knows the solution for it.

Rules in Ruler: {"name":"ruler_check_rules","file":"default","rules":[{"state":"inactive","name":"check_new_up","query":"up == 1","duration":0,"labels":{"cortex":"ruler"},"annotations":{"ruler":"cortex"},"alerts":[],"health":"ok","lastError":"","type":"alerting","lastEvaluation":"2021-07-20T07:57:53.002649507Z","evaluationTime":0.004718986}],"interval":60,"lastEvaluation":"2021-07-20T07:57:53.002626318Z","evaluationTime":0.004746094}

Rahuly360 commented 3 years ago

@jakubgs @gotjosh gotjosh Can you please share the steps that how did you compile the ruler code and run the ruler? I am configuring the rules in the ruler cortex but the ruler is in inactive state as mention in the above comment. Please help here.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.