vine: Wait no wait - Githubissues

btovar commented 1 month ago

Return any completed task to the application without doing any work.

Post-change actions

Put an 'x' in the boxes that describe post-change actions that you have done. The more 'x' ticked, the faster your changes are accepted by maintainers.

[x] make test Run local tests prior to pushing.
[x] make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
[ ] make lint Run lint on source code prior to pushing.
[ ] Manual Update Did you update the manual to reflect your changes, if appropriate? This action should be done after your changes are approved but not merged.
[x] Type Labels Select github labels for the type of this change: bug, enhancement, etc.
[x] Product Labels Select github labels for the product affected: TaskVine, Makeflow, etc.
[x] PR RTM Mark your PR as ready to merge.

Additional comments

This section is dedicated to changes that are ambitious or complex and require substantial discussions. Feel free to start the ball rolling.

dthain commented 1 month ago

I see the need for this and it makes sense, but a few things about the API:

Is wait_no_wait somehow different from wait_for_tag with a timeout of zero?
Would it make sense to have wait_no_wait without a tag?
Let's brainstorm some additional ideas for the name.

btovar commented 1 month ago

Time zero has meant to wait at least one second, so I didn't want to mess with that. But yes, if not to preserve backwards compatibility I would have preferred not to add a call and use timeout=0. If you think it is ok, we can use timeout=0, in that way we don't have to come up with a new name.

btovar commented 1 month ago

It has to have a tag, just in case the daskvine manager is managing two dags at the same time. (Currently not possible as it waits for the queues to be empty, but we will need that for notebooks in the future.)

dthain commented 1 month ago

So the timeout value is funny because we want to put some approximate limit on waiting time. But once the manger begins to interact with a worker, certain actions cannot be interrupted (e.g. a file transfer) and so it's easy to take longer than the timeout value. I think the timeout value really means "max time to wait idle for a message to arrive." So, I can see several regimes for waiting:

1 - Do not wait for anything, only return a completed task if available. 2 - Do not wait idly for messages to arrive, but process any pending messages on sockets. (Requires calling link_wait at least once.) 3 - Wait idly for messages to arrive, up to N seconds. 4 - Wait idly forever until something happens.

Did I miss any cases?

I believe that case 2 corresponds to timeout==0, case three is timeout>0 and case four is timeout==VINE_WAIT_FOREVER

Would it be better to change the meaning for timeout==0 or to introduce a new symbol for case 1?

btovar commented 1 month ago

Since we convert timeout=0 to 1, it loosely correspond to case 2. From a user perspective case 2 is hard to explain without going into the particulars of the implementation, so I wouldn't make it an official case. My preference is having timeout=0 do what the wait_no_wait call is doing here, similar to WNOHANG.

dthain commented 1 month ago

That makes sense to me.

cooperative-computing-lab / cctools

vine: Wait no wait #3815

Post-change actions

Additional comments