Kitware / Remus

Remus is a remote mesh/model service framework.
Other
9 stars 9 forks source link

Ability to change workers timeout. #234

Closed robertmaynard closed 9 years ago

robertmaynard commented 9 years ago

Currently each worker heartbeat is hardcoded to have a floor of 250ms and ceiling of 1 minute. This doesn't work well for applications as they might want to enforce quicker detection of crashed workers than every minute.

What we should do is extend remus::worker::Worker to be more like remus::server::Server so that people can determine better heart-beat controls for the worker. Something like the following should suffice:

remus::worker::ServerConnection conn = remus::worker::make_ServerConnection(ports.worker().endpoint());

remus::common::MeshIOType io_type = remus::common::make_MeshIOType(Mesh2D(),Mesh3D());
remus::proto::JobRequirements requirements = make_JobRequirements(io_type, "SimpleWorker", "");
remus::Worker w(requirements,conn);

w->setHeartbeatRates( remus::worker::HeartbeatRates(50,150) );

remus::worker::Job wjob = worker->getJob();
...

This would construct a worker and state it should heartbeat no faster than every 50msec, and no slower than every 150msec. If the worker doesn't heartbeat every 150msec the server will presume it to be dead.

robertmaynard commented 9 years ago

Fixed by PR #235