Closed KumoLiu closed 1 month ago
cc @YuanTingHsieh @holgerroth
I attempted to add a try-except block to the set_experiment
function in the MLflow handler. However, I'm uncertain if this achieves the desired behavior.
def _set_experiment(self):
experiment = self.experiment
if not experiment:
for attempt in range(3):
try:
experiment = self.client.get_experiment_by_name(self.experiment_name)
if not experiment:
experiment_id = self.client.create_experiment(self.experiment_name)
experiment = self.client.get_experiment(experiment_id)
break
except MlflowException as e:
if "RESOURCE_ALREADY_EXISTS" in str(e):
time.sleep(self.retry_delay)
continue
else:
raise e
@KumoLiu what about we add a line asking people to create this experiment first?
Like a one line code using MLFlow to create that experiment?
@KumoLiu what about we add a line asking people to create this experiment first?
Like a one line code using MLFlow to create that experiment?
Hi @YuanTingHsieh, thanks for the suggestion! The mlflowhander is included inside the bundle. And the issue here is that when two sites create the experiment at the same time, it will throw this error. One possible solution is that try-catch the error during creating the experiment. What do you think?
Looks like there has some issue when running the monai real word example. When site-2 start evaluating, the monai_nvflare experiment is already exist. Should handle such case.