Closed ablaom closed 1 month ago
This requirement is a very specific task. Not everyone is using multithreading/multiprocessing to perform this kind of operations.
MLFlowClient.jl
is mirroring the capabilities the original package is performing. So, in my point of view, we must not implement a buffering solution here. This is something the user will take care of.
In the MLJ.jl
context, our library MLJFlow.jl
contains two POC workaround using Locks
and Channels
. It can be seen here JuliaAI/MLJFlow.jl#36.
I'm not 100% convinced. It seems to me any other Julia software that wants to do mlflow logging will run into exactly the same issue if they have parallelism. However, for now I'm happy to shelve the proposal in favour of the specific solutions you have worked out, thank you!
The context of this proposal is this synchronisation issue.
The main problem with logging in parallelized operations is simply this: requests are posted directly to an MLflow service without full information about the state the service at the time the request is ultimately acted on. I propose we resolve this as follows:
Instead of a client posting requests directly to an MLflow service, they are posted (
put!
) to a first-in-first-out queue (JuliaChannel
). Requesting calls will return immediately, unless the queue is full. In this way, the performance of the parallel workload is not impacted.A single Julia
Task
dispatches requests (take!
s) from the end of the queue. Whenever a request has the possibility of altering the service state (e.g., creating an experiment), then the dispatcher waits for confirmation that the state change is complete before dispatching the next request.I imagine that we can insert the queue (buffer) without breaking the user-facing interface of MLFlowClient.jl.
I have implemented a POC for this proposal and shared it with two maintainers, and can share with anyone else interested.