Closed stefano-brambilla-venchi closed 2 weeks ago
Hi sorry to hear that. Unfortunately, this is something that will be really hard to debug it may be some network error given you setup but hardly possible to tell. Can you try
mlflow ui --backend-store-uri file:///path/to/mlruns --host 127.0.0.1 --port 5000
and see if the error still happen? If yes, this is a problem of mlflow / gunicorn / your network I can't reallly fix. If no, This is something I should investigate on kedro-mlflow
side.
Hi @stefano-brambilla-venchi, did you get a chance to try with above suggestion? Are you still experiencing the error?
I close the issue since it is very likely not related to kedro-mlflow, but feel free to reopen if you have more details.
Description
When I run
kedro mlflow ui
, which initially works correctly, after some time (usually one hour) I get a timeout and I am not able to reach again the UI.Context
I am just using the service in a very standard kedro pipeline.
Steps to Reproduce
I simply run
kedro mlflow ui
and use it as intended. I correctly reach the UI and use it. After a random time, usually a couple hours, the UI stops working, and I must restart the service.I noticed that when the UI stops working I usually get an
[CRITICAL] WORKER TIMEOUT (pid:...) --> [ERROR] Error handling request (no URI read)
in the log of mlflow. I however see the service alive on the port on other pids through the commandlsof -i :5000
.Expected Result
The service should never go down.
Actual Result
This is an extract of my log:
Your Environment
Python 3.11.9 Kedro 0.19.5 MLflow 2.13.0 Kedro-mlflow 0.12.2
My Mlflow URI is localhost (127.0.0.1), I work on an Azure remote VM with ssh-tunnel port forwarding. The VM is an Ubuntu 22.0.4. The laptop from where I am reaching the port is a Windows 10. The ssh tunnel is performed through Visual Studio Code.
Does the bug also happen with the last version on master?
I have not tried.