Have `requestExists` to return `True` if a worker is working on a request, or have workers to stop working on dead requests

fpco / libraries

FP Complete libraries mega-repo

2 stars 0 forks source link

Have `requestExists` to return `True` if a worker is working on a request, or have workers to stop working on dead requests #176

Closed bitonic closed 8 years ago

bitonic commented 8 years ago

The current scenario can happen:

A request is submitted
A worker starts working on it
The request expires while the worker is working on it
Clients won't be able to wait on the response because the request is gone

It would be nice if requests did not expire while a worker is working on them.

bitonic commented 8 years ago

Alternatively, we should have workers to stop working on dead requests. However, it's not very easy to do that with the current design.

kantp commented 8 years ago

The workers already check periodically if the job got cancelled manually. I guess we could add

periodically checking if the request is still in the database, or has expired
or, periodically update the time to live while working on the job.

bitonic commented 8 years ago

Yes, I think that's probably the way to go, but the second option is kind of hard because we'd have to guarantee to always bump the TTL "in time", to avoid spurious cancels.

I think the best thing to do, for now, is to just have it to check if the request data is still there, and stop working if it isn't.

kantp commented 8 years ago

I've put some more thought into this, because I like option 2 better, and I think I have a design that is simple enough:

(optional) Before reading the request data, the worker bumps its TTL.
While working on the job, we update the TTL of the request data periodically. The time between updates should be a fraction of the jqcRequestExpiry.
If we fail to update the TTL at any time, we stop working on the job.

Step 1. prevents us from picking up a job that will expire immediately. Step 2. ensures that the job will not expire while we're working on it and have a connection to the database. Step 3. is a fall-back to option 1 (stop working on expired requests), in case the job does expire for some reason (if, for example, we didn't update the TTL quickly enough).

Step 1. is optional, because Step 3. makes the result of picking up a job that immediately expires harmless.

bitonic commented 8 years ago

I think trying to bump the TTL might be a good idea, but it can be seen as something on top of the existence check implemented in #177 , and as a "nice to have" rather than a guarantee, since it's going to be tricky to guarantee that the request key is not expired. So I'd go for merging #177 first, and then implement improvements on top of it.

kantp commented 8 years ago

You're right.

Also, not updating gives us a guarantee we won't block resources indefinitely on non-terminating jobs.