Closed gurvindersingh closed 9 years ago
On the sender side i am seeing this error while restarting the process
{:timestamp=>"2014-05-08T09:50:14.961000+0200", :message=>"ZeroMQ error while in send_string", :error_code=>-1, :level=>:error}
{:timestamp=>"2014-05-08T09:50:14.961000+0200", :message=>"0mq output exception", :address=>["tcp://*:2100"], :exception=>#
Also it does not restart properly process has to be killed, so it seems there is really an issue with zeromq to use it for input and output. I am using pushpull topolgy.
Here is debug information if this may help on the receiver side
ZMQ Error {:subscriber=>#<ZMQ::Socket:0x96e32ba @name="PULL", @option_lookup=[nil, 1, nil, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, nil, nil, nil, nil, nil, 0,
0], @int_cache=nil, @more_parts_array=[], @receiver_klass=ZMQ::Message, @socket=#
The issue appears to be related to ZMQ_RCVTIMEO on receiving and ZMQ_SNDTIMEO for sending. It seems as the default value is -1 which means wait infinitely. So ZMQ just keep waiting for the message and discards the TERM signal, so you have to kill the process. If I set the ZMQ_RCVTIMEO value to say 5 sec, then if it does not get message in 5 sec it returns error as
ZeroMQ error while in recv_string {:error_code=>-1, :level=>:error}
As ZMQ sends EAGIAN on no message after timeout value. So I wonder what is the solution in this case ?
Any comments/suggestions are appreciated.
So I wonder what is the best solution in such a scenario. Don't use ZeroMQ at all, as there are not so many options left if you want elasticity without any single point of failure as ZeroMQ provides in pushpull topology.
Using ZMQ REQ/REP sockets is not going to be terribly easy here. Those sockets do not allow for retries or resends because of their state machine. The only way to recover from a dropped message is to close the socket and start fresh. If REP socket responses can sometimes be not sent, this requires handling also on the REP socket side. In addition to that, this requires that both server and client do these handlings in lock-step, with multiple parallel sockets used if parallelism is wanted.
I would say the ZeroMQ code needs to be re-engineered for reliability, if that is wished.
— Reply to this email directly or view it on GitHub https://github.com/elasticsearch/logstash/issues/1320#issuecomment-42661037.
Thanks for reporting & troubleshooting this. The zeromq filter has been moved to logstash-contrib so can you open the issue over there and reference this one?
I can do that, but I am using zeromq not as filter but input and output. Is input/output also now moved to logstsh-contrib ?
Just a quick note: REQ/REP sockets in ZeroMQ are not really meant to work in scenarios where messages can be dropped. The only way to recover from any error is to close the socket entirely and open a new one. The plugin should be re-engineered for reliability.
It might be in REQ/REP sockets mode, but the issue which I mentioned here is for push pull topology and messages are not sent to the producer/servers back from clients only received from them and sent down the line to elasticsearch using elasticsearch output.
Sorry, you are probably talking about the input/output zeromq plugins - the only one I took a peek at was the zeromq filter, which uses only a REQ socket. My mistake for thinking these two plugins were the same.
Oops, brain fart. You're talking about the zeromq input/output, ignore my last comment.
So, this is really hard to diagnose with just the logs you provided. Can you create a minimal config with steps to reproduce this problem?
For server side, which receives input from logstash-forwarder using lumberjack and output it using zeromq in server mode as
input {
lumberjack {
port => 5000
ssl_certificate => "
output { zeromq { address => ["tcp://*:2100"] topology => "pushpull" mode => "server" } }
Now on the client side to receive events using zeromq as
input {
zeromq {
address =>
output { stdout { codec => "json" } }
Just send one or few events from logstash-forwarder to server and once you see the event output to stdout on client side, try to stop the client logstash process. It will fail, you have to kill it. But if you enable elasticsearch output and run it in background and try to restart the process, you will the error I mentioned above in the report. But if you specify this config on client side to receive input as
input { zeromq { topology => "pushpull" address => ["tcp://server_adress:2100"] mode => "client" sockopt => ["ZMQ::RCVTIMEO", 5000] } }
you will see the same message being printed out every 5 secs, if logstash does not receive any message during 5 secs interval, but now you will be able to stop the process by SIGTERM signal rather than killing it with SIGKILL.
Ok, thanks, we'll have to run some tests with this & followup.
I just hit a similar problem when I upgraded to logstash 1.4.1 from 1.3.2. I had a continuous stream of "ZeroMQ error while in recv_string" errors being spewed out. The problem turned out to be the fact the the logstash tar ball contains .gitignore files. See https://logstash.jira.com/browse/LOGSTASH-2252
@timbunce https://logstash.jira.com/browse/LOGSTASH-2252 is resolved, though I am not really sure how zeromq problems relate to .gitignore
When restarting logstash when receiving input from zeromq, logstash gives error as
{:timestamp=>"2014-05-01T00:38:36.503000+0200", :message=>"ZeroMQ error while in recv_string", :error_code=>-1, :level=>:error}
Error repeats for a while before logstash has started up completely, it functions fine afterwards though