Closed st3xupery closed 6 years ago
Can you execute container_memory_usage_bytes
expression and send the output?
When I execute
container_memory_usage_bytes{container_label_com_docker_swarm_service_name="monitoring_elasticsearch"} / container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="monitoring_elasticsearch"} > 0.1
or
container_memory_usage_bytes{container_label_com_docker_swarm_service_name="monitoring_elasticsearch"}
or
container_memory_usage_bytes
I get no data
. Something about my deployment must be off.
Here are some additional specs I can find
prometheus --version
prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d82a045d103ea7f3c89a91fba4a93e6367a)
build user: root@6e784304d3ff
build date: 20180119-12:01:23
go version: go1.9.2
cat /etc/prometheus/prometheus.yml
global:
scrape_interval: 10s
rule_files:
- alert.rules
The content of alert.rules
looks all in order also.
The problem is in DFSL. It has only the proxy as the address in DF_NOTIFY_CREATE_SERVICE_URL
and DF_NOTIFY_REMOVE_SERVICE_URL
. You need to add (comma separated) the address of Prometheus as well (DFM). Otherwise, it will never receive a notification about exporters.
OH! I do see what you are saying and have modified my environment variables accordingly.
swarm-listener:
...
environment:
- 'DF_NOTIFY_CREATE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/reconfigure,http://monitor:8080/v1/docker-flow-proxy/reconfigure'
- 'DF_NOTIFY_REMOVE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/remove,http://monitor:8080/v1/docker-flow-proxy/remove'
What I had to failed to see was proxy
was a reference to the service
and not to the overlay network
. Because of that confusion, I thought to place DFP, DFSL and DFM on the proxy network was sufficient and that proxy
in the URL would talk to them all. I do know that's not how overlay
networks work, but clearly, I needed a second set of eyeballs to help.
That being said, it has been several minutes now and my aforementioned issues don't seem to have changed.
I even went so far as to remove DFP from the equation but none of the queries e.g. container_memory_usage_bytes
seem to produce any result in the Prometheus dashboard. Even an error would be more insightful to me.
The problem is that you changed the name of the service to monitor
but you left the rest of the address intact (http://monitor:8080/v1/docker-flow-proxy/reconfigure
). The reconfigure address should be http://monitor:8080/v1/docker-flow-monitor/reconfigure
. You can find an example in http://monitor.dockerflow.com/tutorial/ .
Oh wow, I feel rather stupid. Well, I appreciate your patience with assisting me as this certainly resolves my issue. Much thanks again!
I found some time to revisit this part of my project again hopeful resolving my URL mistake would be the key, but I still find myself with unresponsive alerts and queries that execute to no data
In the example below I keep swarm-listener
on a proxy
network and a monitor
network with DFM sharing the monitor
network. But I also tried putting them both on exclusively proxy
. In both cases nothing changes.
swarm-listener:
image: vfarcic/docker-flow-swarm-listener
networks:
- proxy
- monitor
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- 'DF_NOTIFY_CREATE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/reconfigure,http://monitor:8080/v1/docker-flow-monitor/reconfigure'
- 'DF_NOTIFY_REMOVE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/remove,http://monitor:8080/v1/docker-flow-monitor/remove'
deploy:
placement:
constraints: [node.role == manager]
monitor:
image: vfarcic/docker-flow-monitor
environment:
- LISTENER_ADDRESS=swarm-listener
- GLOBAL_SCRAPE_INTERVAL=10s
networks:
- monitor
ports:
- 9090:9090
I really wish I could provide more substantial info but I have exhausted all possible logs.
Is there an example that uses both DFM and DFP that you know to work that I can experiment with locally?
Please send me the current config of your stacks and I'll try to replicate the problem inside one of my clusters.
P.S. Sorry for not responding earlier. I had too much work on my plate.
Closing due to inactivity.
My main problem is no matter how restrictive I set my mem limit I cannot get the alert to indicate active on the
/alerts
page in Prometheus. In the example below you will see I have set my service's mem_limit to 10% where at rest, the service in question uses at least 60% of it's available memory limit, and to be triggered with no timespan. Yet no long how I wait for the alert says(0 active)
This is how the alert translates into Prometheus
When I plug the
expr
into the Prometheus Expression receiver I get no-data. Not evencontainer_memory_usage_bytes{container_label_com_docker_swarm_service_name="monitoring_elasticsearch"}
seems to produce a result.Here are the relevant docker-compose instructions
It may be worth noting that I have not incorporated the
alert-manager
as I didn't want to set it up and figured I could test my alert settings before moving on to that step. Am I wrong in assuming I can continue withdocker-flow-monitor
withoutalert-manager
.It's also worth noting that I am using proxy as the shared network between
docker-flow-monitor
,docker-flow-swarm-listener
because I am also usingdocker-flow-proxy
in this stack.It may also be worth noting that I must manually restart the
docker-flow-monitor
service for new alerts to register in the prometheus web console after spinning up other services that are notdocker-flow-monitor
I am not sure if that is intended behavior and perhaps this is a sign of something else wrong.Nothing in the monitor logs seem to indicate anything is amiss either
I am fully at a loss on how to debug this further. Perhaps I have made some mistake along the way or misunderstand what I should be expecting.