Closed piegamesde closed 3 years ago
Thank you for filing an issue. I rebooted the app, this should fix it. Sorry for the inconvenience, the app is now up and running again.
Thanks
I'm sorry, it looks like I broke it again … maybe there's a bug somewhere that causes it to crash?
Yes, seems like there is an issue with streamlit healthz... I will look into that later today or tomorrow, will reboot app for now.
Leaving the logs here too:
folder path already exists: ./Data/Models/R_50_FPN_3x-staves folder path already exists: ./Data/Models/R_50_FPN_3x-system_measures-staves folder path already exists: ./Data/Models/R_50_FPN_3x-system_measures-stave_measures-staves folder path already exists: ./Data/Models/R_101_FPN_3x-system_measures folder path already exists: ./Data/Models/R_101_FPN_3x-stave_measures folder path already exists: ./Data/Models/R_101_FPN_3x-staves folder path already exists: ./Data/Models/R_101_FPN_3x-system_measures-staves folder path already exists: ./Data/Models/R_101_FPN_3x-system_measures-stave_measures-staves folder path already exists: ./Data/Models/X_101_32x8d_FPN_3x-system_measures folder path already exists: ./Data/Models/X_101_32x8d_FPN_3x-stave_measures folder path already exists: ./Data/Models/X_101_32x8d_FPN_3x-staves folder path already exists: ./Data/Models/X_101_32x8d_FPN_3x-system_measures-staves folder path already exists: ./Data/Models/X_101_32x8d_FPN_3x-system_measures-stave_measures-staves folder path already exists: ./Data/Models/R_50_FPN_3x-system_measures [2021-04-02 15:06:56.971978] 2021-04-02 15:06:56.971 Loading checkpoint from ./Data/Models/R_101_FPN_3x-system_measures/model_0015599.pth [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Error checking Streamlit healthz: Get "http://localhost:8501/healthz": dial tcp 127.0.0.1:8501: connect: connection refused [manager] Streamlit server consistently failed status checks [manager] Please fix the errors, push an update to the git repo, or reboot the app.
Would be helpful if you could provide a short log of what you did before the crash happened. Thanks
That's the thing, I don't remember having done anything special. I tried it out, loaded an image, played around with the settings and had a look at the results. I'll try again and pay attention to the last thing I did before it crashes (if it does again).
Okay, that was fast 😅
This time, I clicked "display original image", and it then crashed after a few seconds of running. But it was the first time I tried out that button, therefore it can't be the only cause for this.
These crashes stem from the unfortunate limitations of streamlit free hosting and are not in my control. My largest models are 750MB which is already really close to the limits of what streamlit can handle. If you use the model ensemble, which loads three of these really large models into memory, the app just crashes. One solution would be to run the app locally, or to deploy the app to a better service.
For reference, this is what one of the streamlit devs posted as answer to the same crash from other users:
https://docs.streamlit.io/en/stable/deploy_streamlit_app.html#resource-limits Apps get up to 1 CPU, 800 MB of RAM, and 800 MB of dedicated storage in a shared execution environment. Ultimately, Torch is a very large dependency, and languages models are as well. While we’re working on figuring out the performance characteristics vs. cost tradeoff of the Streamlit sharing service, there isn’t a good answer to your problem right now other than using a bigger deployment instance (whether self-hosted or commercial service).
The limitation itself is not really a problem, the demo would still be very useful with less features. But it has no error handling, therefore it is easy to accidentally crash, taking it down for all users.
The app worked fine, until it crashed, and now it always crashes and I can't get it to work anymore :(