Open robertmaynard opened 9 years ago
@vibraphone for your review.
Some issues:
class Monitor
{
std::set<std::string> domains() const;
void subscribe(const std::string& domain);
void unsubscribe(const std::string& domain);
virtual void event(const std::string& domain, cJSON* msg);
};
With a well-documented and tested set of message formats (in JSON), this would be super-easy to use. I agree that it would be nice to have a utility for monitoring a single job or error messages. That could simply be a subclass of Monitor that takes a std::function (or boost::function if not in C++11 mode) and calls it when a regular expression on the domain and/or message is matched?
Now this is why I like typing it all out.
You are correct the initial design is impossible, and I prefer your design for the Monitor class, but I think that it should allow the user to specify the callback via boost::function in the subscribe call.
Now onto the issue of domains. I want to leverage as much of ZMQ pub/sub model as possible and it has the feature that it only sends messages to subscribers that have a matching prefix subscription. What this means is that regex matching would require us to subscribe to all messages and filter afterwards. Rather I would rather have the Monitor class force explicit subscriptions, but at the same time have a very simple user API so how about:
so something like:
class Monitor
{
std::set<std::string> domains() const;
//func is expected to have the following type signature
// operator()(const std::string& domain, cJSON* msg)
//
//will return the domain string that can be used to unsubscribe
void subscribe(const std::string& domain, boost::function func);
void unsubscribe(const std::string& domain);
};
That looks good so far, but
remus::function
to either std::function
or boost::function
depending on what's available? That future-proofs it. void subscribe(
const std::string& domain,
remus::function<void(const std::string& domain, cJSON* msg, Client* source)>);
boost::shared_ptr
, boost::thread
etc over to the remus namespace go for it.Looks good to me.
Need to have a user facing class which is called ServerEventLogger.
The ServerEventLogger will allow the user to log to a std::ostream all information that is being broadcasted on the Event stream.
Usage of the ServerEventLogger will be roughly.
remus::server::Server server;
//start accepting connections for clients and workers
bool valid = server.startBrokering();
if(valid)
{
ServerEventLogger logger = server.constructServerLogger();
std::fstream outFile("server.log", std::ios::out);
logger.start( outFile );
}
Events that would be nice for the server to log include:
Those all seem reasonable, sending messages for when a job has no available workers will require some additional work. Mainly due to the fact that we try numerous times per second to match queued jobs to current workers, and I don't want to emit a notification each time we do that.
@robertmaynard Maybe only emit a message when the number of unresolved jobs changes?
Issue
By leveraging the pub-sub zeroMQ model we can allow the server to start broadcasting a stream of status and monitoring information.
This solves two large and outstanding issue when dealing with Remus. The first is that the server is a black box that has zero ways of informing clients or third parties about what is happening, if any internal errors are occurring, etc. The second issue is that the client is limited to using a busy wait to check on status events occurring, but if we allow the client to also use a pub / sub style connection to the server we can make a more efficient status monitoring client.
Technical Issues
The primary issues with the pub / sub model is the classic slow joiner issue. The problem is that you can't determine when a subscriber starts to get messages. Even if the subscriber is started before the publisher, the subscriber will always miss the first few messages that the publisher sends. This is because as the subscriber connects to the publisher, the publisher will have already sent messages that will be missed by the client.
If the monitor was emitting just general status messages and the goal was to show the overall health of the server, the slow joiner issue would not be a problem. But as it can be used to monitor specific jobs we need to some way to minimize the severity or even occurrence of the slow joiner . A couple of decent solutions are proposed in the Node Coordination ( http://zguide.zeromq.org/page:all#Node-Coordination ) section of the ZMQ guide. Personally I think the best way for Remus is:
Extending the Server
Here is a very high level requirements for the publication on the server
remus::server::ServerPorts
Client monitoring of the Server
To monitor the activity of the server a new
remus::client
class calledMonitor`` (or
ServerMonitor``` ?) will be created. This class must be extensible so that the user can plug it into their own code easily. A quick draft of what the Monitor class would look like is:So it than becomes fairly easy to construct a JobMonitor
Pub/Sub Message Layout
The message layout will be required to be a multipart message as ZMQ only supports prefix filtering. So that means that the first message component will be the key we will need to filter on.
The easiest method will be to make the first message in itself a key value pair where the key component is one of the following:
And the value component is the following:
References: