WIPACrepo / pyglidein

Some python scripts to launch HTCondor glideins
MIT License
7 stars 20 forks source link

Heartbeat and monitoring for clients #14

Closed gonzalomerino closed 8 years ago

gonzalomerino commented 8 years ago

It would be nice to have something in pyglidein that tells us what is the status of the various clients. I am thinking something that can be used to detect if a client is dead, or misbehaving would be useful. Also, collecting few client related metrics (in a log file in the server for instance) per client, might be useful for plotting a couple of graphs that show us how things are doing.

dsschult commented 8 years ago

This feature can be implemented two ways. Either the client sends additional parameters when asking for the state, or as separate requests. I'm currently favoring separate requests, since that seems easier to expand in the future.

dsschult commented 8 years ago

One problem with implementing this is that currently clients do not have unique ids to distinguish themselves. An IP address works in some cases, but one clear case where it doesn't is when more than one client is running on the same submit machine (which is a real case that already exists).

There are two easy options:

  1. require a unique name in the config file
  2. create and save a UUID when the client first starts

These both have problems:

  1. cannot use the same config file for the same cluster but different users
  2. cannot use the same submit directory for multiple instances

Anyone have ideas?

briedel commented 8 years ago

It depends on how people run the client. In my case, I would add an item to the client.py options to pass a UUID.

dsschult commented 8 years ago

Passing via OptionParser, you mean? That's a decent solution to the problem.

briedel commented 8 years ago

Something like this: https://github.com/briedel/pyglidein/commit/69f01280d7ded8d16a4c4764ae8a3d1f14fad73f