Increase MLflow HTTP retry

Increase the amount that mlflow will retry and wait for HTTP requests if the server is unavailable. This increases the window during which the server can go offline, without the runs crashing.

Currently, the default values are:

MLFLOW_HTTP_REQUEST_MAX_RETRIES = 5
MLFLOW_HTTP_REQUEST_TIMEOUT = 120

I need to do some testing for which values make sense. There is an exponential backoff mechanism, but also a maximum backoff. Ideally I want to set these so it allows for several hours of server down time, to deal with system sessions.

I fetch them from the config object with a default value so that they can be set there, but I don't want to add them to the default config because ideally they should not be set by the user. So for power-users/debugging, I make them configurable, but the regular user doesn't see them.

Mitigates #110 , but not a real solution.

ecmwf / anemoi-training

Increase MLflow HTTP retry #111