Open btovar opened 1 month ago
@btovar could you update this with the current state of things? I believe you added some transformation re buffers and disk for function calls
Currently "task-mode" and "function-call-mode" do the same, which is to write inputs and outputs to files on disk. This means that FunctionCall does not use buffers anymore. This is the safe thing to do but may lose some of the performance of using buffers.
It seems to me that we have two different modes of operating that need to be supported:
1 - A standalone function call that will return a Python result to be used in-memory by the manager program. The current mode of operation seems to support this well.
2 - A function call that operates as part of a DAG (as in Dask). In this mode, the input and output file objects should be declared externally to the task and attached with add_input
and add_output
and then pruned and undeclared when no longer needed.
And I will throw out the following general constraint:
Perhaps we need to reorganize the class hierarchy to better reflect that 1 is implemented as an extension around 2.
Pulling over brief comment from Ben in #3800:
The dask executor uses a combination of temporary outputs and regular outputs to compute the graph. Function calls using buffers assume that the results are used immediately when the task returns, but in the dask executor they may be used once the task has gone out of scope and garbage collected.
In #3824 I change function calls to not use buffers, as that should work for all cases. However, it may hurt a bit in performance. @tphung3 @BarrySlyDelgado @colinthomas-z80