Open lvijnck opened 3 months ago
Hi, I understand the need for such a feature, and it would be a great addition.
However, mlflow does not let external orchestrator defines the run id on their own, and it seems really wrong to use the run name to do it, because by design it may not be unique. This would require a lot of custom logic on kedro-mlflow's side, and I am not sure this is the correct way to do it.
I think before rushing into an implementation we should investigate how people handle this for orchestrators like airflow, and eventually make such request directly in the mlflow repo. I d'ont close the issue because it's worth keeping track of this feature request, but I don't see it implemented as is.
Hi, I absolutely see you point. I think it's an ugly workaround, but I could not think of any other way. We use it on a daily basis now, and the RUN_NAME
is injected directly from Argo Workflows, making it unique.
I do however think that the plugin should be able to allow a setup like this, otherwise this would render the plugin useless for pipelines that run distributed.
Description
We're leveraging Argo Workflows to orchestrate our pipeline, which results in each of the nodes being executed as an individual
kedro run -n NODE
invocation. With the vanilla setup ofkedro-mlflow
this results in a new run id for each of the nodes, which is highly undesirable.Context
Being able to run large pipelines in a distributed manner
Possible Implementation
To overcome this limitation, we introduced an additional constraint that enforces uniqueness of the
run name
(code below). We've then implemented a hook:run-name
is defined, verifyrun
with name existsrun
exists, set therun id
run id
My suggestion would be to add a flag to the
mlflow
configuration, e.g..,Possible Alternatives
Supplying a static
run-id
is not possible, as this results in the aResourceNotFoundError
. The API is also limited in the sense that it is not possible to create a specific run-id.