felskrone / salt-eventsd

events-listener daemon for saltstack that writes data into a database
Other
52 stars 12 forks source link

Batching of events not sent as batch to worker send() method #28

Closed Grokzen closed 9 years ago

Grokzen commented 9 years ago

I was playing around with the batching/queue feature already present in the code to see how it was performing and working etc... and i stumbled upon a small problem when it is sent to the worker class.

I set the following in my config 'event_limit': 100 and simulated around 1000 events/sec so it should be starting 10 new thread per second and pass along 100 events per thread.

By this logic i would expect that when the worker is triggered and send() is called i would get all 100 events in a list or set and i could do anything i want with it. This is not the case right now.

In worker.py currently the list of events is itterated one by one (of those 100 events that was batched earlier) and sent one by one to the worker threads send() method.

This however is not what i kinda expect when i set a event queue of 100. I would expect that it would send all 100 events (or a filtered list of relevant events) to send() method and i could process them in there however i want. For example batching them together and sending them off to a external server for long term storage or bulk insertion into mysql etc...

felskrone commented 9 years ago

The idea behind that is to queue events up to event_limit, loop through the events, instantiate all necessary worker-threads for the collected events (and only those), and then push the events into their matching workers. That means, that of the 100 events collected, a worker-thread might only receive a single one while others receive more (that depends on the number of filters in the config).

That way, a worker-thread only builds a connection to its backend (mysql, redis, whatever) only once and not for each event. Other workers are never initiated, because no matching events were collecte).

Pushing events into workers in bulk sounds like a good performance enhancement. A list of matching events and their associated worker-threads would have to be build in worker.py near

https://github.com/felskrone/salt-eventsd/blob/master/salteventsd/worker.py#L81

Keep in mind though, that changes in that area brake other users workers/configs, unless its implemented with a new setting.

Out of curiousity: what is your current use-case for that? Im asking, because i am only doing things in mysql for now, and all im doing are inserts and updates. And since each event a worker receives only matches a single row in my server-table, i can only do something like:

UPDATE server_table set key='value' where servername='server01.domain.com';

and not

UPDATE server_table set key='value' where some_key like '%value';

Grokzen commented 9 years ago

I would suggest this to keep the backwards compability with existing installations/configs:

The use-case for this feature is simply optimization no matter what datastorage is used. If it was mysql i would do batched updates to reduce the number of queries the server would have to handle. If it was redis i would make a pipeline out of it and send all events at once.

And in my case i would like to group all data from the events into a single json blob and do a POST to a webbserver in another part of our system. For example if there would be running at 100 events/second it would have to do 100 POST:s to the server each second and that would put some strain on that server. The goal is to reduce the number of POST:s so the server do not fall over that easily.

I can take a stab at this and see what i can do.

felskrone commented 9 years ago

Fixed by himself :-)