WIPACrepo / pyglidein

Some python scripts to launch HTCondor glideins
MIT License
7 stars 20 forks source link

figure out why monitoring is flapping #88

Closed dsschult closed 7 years ago

dsschult commented 7 years ago

The site monitoring for slack is flapping a lot (going up and down about once a day). Since no user intervention is taking place, things must be working OK.

Maybe the client isn't reporting monitoring information when nothing is queued?

jvansanten commented 7 years ago

The client appears to report monitoring info uncondtionally. Maybe the affected sites are running on a weird cron schedule?

The two times that the bot warning has triggered for the Zeuthen site have been real failures where the batch system went down temporarily, and submissions began to fail, taking down the client.

dsschult commented 7 years ago

So this disappeared. Not sure if that means monitoring is broken, but this goes away.