Network Safe Connection

ysmolski commented 10 years ago

I have implemented class called NetworkSafeConnection [1].

It handles two problems of regular Connection:

It can reconnect, hiding socket exceptions from user of Connection, blocking execution until call happens successfully.. This is done in two ways (regulated by parameter block_until_success):
1. Retrying infinitely until success it achieved
2. Doing not more than MAX_RETRIES attempts, with delays of RETRY_DELAY_SECONDS
It can restore state of used and watched tubes before connection was lost
I have put down example (tests) in the NetworkSafeConnection.mkd [2]

The only problem is that if socket error happened while job was reserved and not deleted by say worker1, after server restarts it will be available for reservation for some other workers. But same time it can be deleted by worker1 when connection will be restored.

Please let me know how do you feel about such extension of functionality. Do you think others can benefit from it?

Doerge commented 10 years ago

The scenario you describe is hard to solve with the current beanstalkd protocol[1]. The only way I can come up with is to call stats-job to retrieve 'time-left' and comparing with that. If the job have timed out, don't send the delete.

I think the most 'sane' place to call stats-job would be in the Job-constructor. Thoughts?

https://github.com/kr/beanstalkd/blob/v1.3/doc/protocol.txt

earl commented 10 years ago

The delete behaviour ist not really a problem but rather fundamentally correct as far as the protocol is concerned. Any job in "ready" state can be deleted by any worker. From the protocol spec: "A client can delete jobs that it has reserved, ready jobs, and jobs that are buried."

Doerge commented 10 years ago

Correct. I was referring to that it is rather hard for a client to know wether the current job being processed is still reserved on the server by the client itself. It also require the stats-job call to figure out how long the job have left, but even then it is not possible to determine wether you are actually the one who have reserved it or not.

To elaborate a bit on the problem provided by ysmolsky: Worker 1 reserves the job. Worker 1 is disconnected from the server before completion, preventing deletion of the job when done. The job times out on the server and is and is put to ready state. Worker 2 reserves the job. Network is back for worker 1. Worker 1 issues the delete. Worker 2 completes and issues a delete resulting in "NOT_FOUND\r\n" as response, which raises the CommandFailed-exception for no apparent reason to worker 2.

earl commented 10 years ago

You cannot delete jobs which are currently reserved by another worker, that's prohibited by the protocol / server. So in this scenario, once worker 1 reconnects the job is reserved by worker 2, so when worker 1 tries to delete the job, this delete will simply fail.

Doerge commented 10 years ago

Ah! I did not know that. Thanks!

On 11/11/2013, at 01.10, Andreas Bolka notifications@github.com wrote:

You cannot delete jobs which are currently reserved by another worker. So in this scenario, once worker 1 reconnects the job is reserved by worker 2, so when worker 1 tries to delete the job, this delete will simply fail.

— Reply to this email directly or view it on GitHub.

ysmolski commented 10 years ago

I see now. That's really nice. So my addition seems to be correct according protocol.

BTW, tests in created documents are being passed.

Gonna do pull request.

ysmolski commented 10 years ago

Pull request: https://github.com/earl/beanstalkc/pull/38

earl / beanstalkc

Network Safe Connection #37