Closed jasonhill-ds closed 1 month ago
Docker compose for the pgwatch2 container:
networks:
monitoring:
volumes:
prometheus-data: {}
grafana-data: {}
pw2_config: {}
pw2_postgres: {}
pw2_grafana: {}
services:
pgwatch2:
image: cybertec/pgwatch2-postgres:1.11.0
container_name: pgwatch2
ports:
- '3001:3000'
- '127.0.0.1:8081:8081'
- '127.0.0.1:5432:5432'
- '127.0.0.1:8080:8080'
restart: unless-stopped
volumes:
- pw2_postgres:/var/lib/postgresql
- pw2_grafana:/var/lib/grafana
- pw2_config:/pgwatch2/persistent-config
env_file:
- pgwatcher2/pgwatcher2.env
networks:
- monitoring
Hello.
Can you please provide steps how can we reproduce the problem? I'm lost in your explanation, sorry
Sorry it has taken so long for me to come back to this: To reproduce:
SELECT * FROM ( SELECT $__timeGroup(time, 2m), dbname, max(case when data->>'is_up' = '1' then 1 else 0 end) as is_up FROM instance_up WHERE $__timeFilter(time) GROUP BY 1,2 ) x
Reduction (B)
Operation: Classic condition
Conditions: when last of A is below 1
Alert conditions:
Expression that will be alerted on: B
Evaluate every: 2m for 4m📅 This issue has been automatically marked as stale because lack of recent activity. It will be closed if no further activity occurs. ♻️ If you think there is new information allowing us to address the issue, please reopen it and provide us with updated details. 🤝 Thank you for your contributions.
I had pgwatch2 monitoring a DB which went down. Was running the instance_up check and alerted when is_up < 1 as below:
During this time, I restarted the pgwatch2 container. This appears to have resulted in pgwatch2 losing what version the DB was and therefore not running the instance_up check (meaning no more rows in the table).
2024/05/22 23:08:57 WARN main: Could not find PG version info for DB ******, skipping shutdown check of metric worker process for instance_up
Once the time window that I was running the check over (now-15m to now) had exceeded the time I last had rows in the table, there are no rows returned from the above query for the database. (before then, the This results in the alert being resolved. Setting an alert state of alerting when "no data returned" or "all null" doesn't work around the issue. There are rows returned for other "UP" databases and there is simply no row for the down DB in the returned result set from the above query).
Is the DB version cached in pgwatch2 and then lost if the process is restarted?