Closed jordansissel closed 9 years ago
This issue aims to fix design problems that caused problems like LOGSTASH-1253
Hi Jordan,
Did you think much more on this? My attention is currently drawn to the protocol.
I know one little fix that urgently needs doing in the server part is to spawn an arbiter thread with a non-blocking queue - events yielding to caller are passed via this thread - and connection threads then add to this queue until it hits specific size - once it hits a size it then backs off. Specifically, this prevents a full LogStash queue blocking a connection thread... the arbiter blocks instead and the connection threads back off (sleep loop so can handle errors and even send backoffs/pings back to clients)
The second part is to look at proper sequence acknowledgements and server part to maintain an internal "port" list with last processed sequence number - so upon a recovered connection client can request last processed log - as long as it was within reasonable time frame (since such small info maybe hours...)
If any of this is already considered and even new protocols looked at I'm definitely all ears and might help out! Whatever protocol is decided I don't mind - I was going to just expand the current - although backwards compat difficult since version ignored and any unknown frame will presumedly crash that connection thread?
Thanks and great tool - works especially nice now with the various pulls out there :+1:
Jason
Just for the record here. I've settled at fixing the issues I was having with protocol, and left it functionally in tact. Specifically, redis queue hitting congestion and connections blocking and timing out was fixed, partial acks fully implemented and a cheeky keep alive to stop random timeouts. https://github.com/driskell/logstash-forwarder/tree/protocol_improvements_and_server_client_improvements
I'll PR on request as there's just too many at the moment. Any other features/fixes I'm now going to branch from my own master as the code has diverged too much.
It's also possible a kafka/scribe-style (pull w/ offsets) protocol could work better here as well. Depends! Needs more discussion.
What about CurveMQ? It's still WIP but could be beneficial once it is matured.
I think also there should be a split between transport and application protocol.
This allows the sensitive security stuff to be left to a third party library with wide use and trust.
This project then concentrates on supplementing the transport if required at application level, and imposing requirements on transport.
Also means transport can be easily switched while experimenting until something is settled upon.
lumberjack is doing well enough right now, will revisit in the future if we need a new protocol improvement.
(This was originally longer and more detailed, but then Safari crashed.)