deepfryed / beanstalk-client

C/C++ beanstalk client
59 stars 44 forks source link

bs_recv_message() blocks when putting a job on queue #27

Closed wimrijnders closed 9 years ago

wimrijnders commented 9 years ago

Recently, I have the following situation:

I'm putting a job on a beanstalk queue using Beanstalk::Client::put(). Within this method, for some reason, the following call is made:

message = bs_recv_message(fd, BS_MESSAGE_NO_BODY);

Where BS_MESSAGE_NO_BODY has value 0. What sometimes happens, is that bs_recv_message() blocks indefinitely on the following line:

ret = recv(fd, message->status, status_max - 1, 0);

This is to be expected, since the last parameter is zero. These are the flags for the call and MSG_DONTWAIT is not specified in this call. I can also see that O_NONBLOCK is set in bs_connect_with_timeout() for the particular case when a timeout value is specified.

Obviously, hanging indefinitely here on send is not desired behaviour. For this reason, I have the following questions:

If I have a request here, it's that the call to recv() in bs_recv_message() is made non-blocking, e.g.:

ret = recv(fd, message->status, status_max - 1, MSG_DONTWAIT);

Is it possible to do this? If not, could you motivate why and please give an alternative solution to this issue?

Thanks in advance,

Wim.

swehner commented 9 years ago

Hey,

In this case the client is waiting to read the response fromt he server on your put command, right? Beanstalkd is expected to answer with something like INSERTED, etc... See https://github.com/kr/beanstalkd/blob/master/doc/protocol.txt for more info on the put command.

So the client needs to actually wait for an answer from beanstalkd before doing anything else. It can't continue sending more data, or execute any different commands before the server answers. Making the read non-blocking would mean you'd have to wait in a busy loop for the server's answer...

How are you doing the put command? Could there be a problem with the data you're sending? Maybe there's some issue with the calculated length of the data? Something like it's announcing more data than it's actually sending - this could explain that the server keeps waiting for more data and doesn't respond to your put command correctly.

Maybe a tcpdump with more info on the command that it's sending would be interesting here...

Cheers,

Stefan

wimrijnders commented 9 years ago

Hi swehner,

I appreciate your swift answering immensely. Upvote for you! You deserve a beer.

OK, I see what you mean. In bs_put(), a packet is sent to the socket, then the call to bs_recv_message() is done as described in my top comment.

OK, so indeed is has to be blocking. Now, I need to understand why it's not returning. I'll see if I can reproduce the behaviour accurately. Will keep you posted.

wimrijnders commented 9 years ago

Everything looks perfectly fine at the level of bs_put(). I won't bother you with the exact numbers, but all sizes check out.

I have a sneaky suspicion now that it might be due to race conditions with multiple threads. e.g. two jobs are put on the tube at the same time and their data interleaves in some way. Looking further....

swehner commented 9 years ago

Hi,

That could be a problem. Are you using the same client from different threads, AFAIK the clients are not thread safe...

wimrijnders commented 9 years ago

I can confirm without reservation that the clients are not thread-safe. I've encapsulated them in application-specific classes to add thread safety using mutexes (among other things). This need not be a problem, as long as it is documented somewhere.

But I obviously (to me) have a hole somewhere in the client handling. Will continue searching after a break.

wimrijnders commented 9 years ago

OK found it. It was indeed a multi-threading issue.

I thought I had it covered with making methods private, but to my dismay there was a friend class declaration which could access the internal calls. So the fix was a one-liner, namely adding a mutex lock at the correct place. And getting rid of the friend class declaration.

So this is a non-issue. Sorry to bother you and thanks for your attention. On to the next problem....