Vine: Function Calls Need More Consistent Method of Returning Results

btovar commented 1 month ago

The dask executor uses a combination of temporary outputs and regular outputs to compute the graph. Function calls using buffers assume that the results are used immediately when the task returns, but in the dask executor they may be used once the task has gone out of scope and garbage collected.

In #3824 I change function calls to not use buffers, as that should work for all cases. However, it may hurt a bit in performance. @tphung3 @BarrySlyDelgado @colinthomas-z80

dthain commented 1 month ago

@btovar could you update this with the current state of things? I believe you added some transformation re buffers and disk for function calls

btovar commented 3 weeks ago

Currently "task-mode" and "function-call-mode" do the same, which is to write inputs and outputs to files on disk. This means that FunctionCall does not use buffers anymore. This is the safe thing to do but may lose some of the performance of using buffers.

dthain commented 4 days ago

It seems to me that we have two different modes of operating that need to be supported: 1 - A standalone function call that will return a Python result to be used in-memory by the manager program. The current mode of operation seems to support this well. 2 - A function call that operates as part of a DAG (as in Dask). In this mode, the input and output file objects should be declared externally to the task and attached with add_input and add_output and then pruned and undeclared when no longer needed.

And I will throw out the following general constraint:

Whoever declares a file is responsible for undeclaring it. If the DAG processor declared the file, it should undeclare it. If the Task declared some file for internal use, then the Task should be responsible for undeclaring it.

Perhaps we need to reorganize the class hierarchy to better reflect that 1 is implemented as an extension around 2.

dthain commented 4 days ago

Pulling over brief comment from Ben in #3800:

The function calls return a buffer which is freed when the task goes out of scope. Either we need a way to specify their actual output file, or set lazy_transfers=True when using function calls.

cooperative-computing-lab / cctools

Vine: Function Calls Need More Consistent Method of Returning Results #3828