function61 / promswarmconnect

Bridges Docker Swarm services to Prometheus without any changes to Prometheus
https://function61.com/
Apache License 2.0
24 stars 6 forks source link

Tasks that are pending scheduling break promswarmconnect #13

Closed treksler closed 4 years ago

treksler commented 4 years ago

if any tasks are Pending because there is no node that meets scheduling constraints, promswarmconnect breaks

eg

<taskid>   my_elasticsearch_exporter.1                    my.dockerhub.com/library/elasticsearch_exporter:latest                                          Running             Pending 15 minutes ago    "no suitable node (scheduling constraints not satisfied on 9 nodes)"

this results in a blank screen in prometheus service discovery page

the issue seems to stem from returning nil here https://github.com/function61/promswarmconnect/blob/7301fa079b617f211b22c2138e4a9a75d2fa0c7b/cmd/promswarmconnect/dockerdiscovery.go#L84

joonas-fi commented 4 years ago

Thanks for the details!

This might be related: https://github.com/function61/promswarmconnect/issues/6#issuecomment-484372457

Are you sure it's that error? Because a couple of lines above we have:

https://github.com/function61/promswarmconnect/blob/7301fa079b617f211b22c2138e4a9a75d2fa0c7b/cmd/promswarmconnect/dockerdiscovery.go#L78

Which should skip over tasks that are not assigned to a node? I think if your task doesn't pass scheduling constraints, no node should be assigned for it?

Are you running the latest image, 20191009_0943_7301fa07? Did you go over the troubleshooting steps? What does this return to you? Does it give an error, or does it return something unexpected?

this results in a blank screen in prometheus service discovery page

There shouldn't be a blank screen. If promswarmconnect errors out, Prometheus should display a clear error message. Can you take a screenshot from Prometheus service discovery page?

treksler commented 4 years ago

I was running 20190126_1620_7b450c47. will update and retest, thanks

joonas-fi commented 4 years ago

Closing, we can reopen if the issue persists