Kitware / Remus

Remus is a remote mesh/model service framework.
Other
9 stars 9 forks source link

Efficient Querying Of Worker Status #266

Closed robertmaynard closed 8 years ago

robertmaynard commented 8 years ago

When under heavy query load, the remus server ask the workerfactory to check it status. Really the factory should only be checking status after one of the following has happened:

  1. a sufficient amount of time has passed
  2. a worker sent a terminate call
  3. a job was sent back as completed

for both 2 and 3 we really want to add a couple hundred ms delay before checking to allow the worker time to cleanup.

robertmaynard commented 8 years ago

This problem is currently being attacked from 2 fronts.

The first is that we are going to stop telling the factory to update its status every-time the server receives a message. Instead we will only update the status of the server when one of the above happens.

The second is that we are going to change the createWorker to first validate the requirements before checking for space, this is important as the space check requires updating the status of each existing worker, and that is slow.

In testing these combined together, lower CPU utilization of remus server by almost 30%, and improve throughput of the server!

robertmaynard commented 8 years ago

PR #267 Limits our checks for worker status to only happen every 250ms when we have no jobs queued. When we have jobs queued, the FindWorkerForQueuedJob method forces a refresh of the status of workers each time it calls WorkerFactory::createWorker.

We still need to have a dirty flag that we set whenever a job finishes or a worker terminates that allows us to refresh the factory.

robertmaynard commented 8 years ago

PR #268 finishes the technical issues, only thing left is adding a benchmark to verify the performance in the future.

robertmaynard commented 8 years ago

PR #269 adds a test to verify the performance in the future.