Closed Tigran01 closed 5 months ago
When I added memory in the command (explicitly setting a memory for the container), it started to work in the beginning, but then again the same error appeared.
(animated_drawings) tigran@Tigrans-MacBook-Air torchserve % docker run -d --name docker_torchserve -p 8080:8080 -p 8081:8081 --memory=6g docker_torchserve 304d1332e8c21b23d47ed2942f6732ee2e1de7e09e0b725f0a4be81d6fd93b2f (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping { "status": "Healthy" } (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (52) Empty reply from server (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer (animated_drawings) tigran@Tigrans-MacBook-Air torchserve % curl http://localhost:8080/ping curl: (56) Recv failure: Connection reset by peer
Also I think worth mentioning that the issue started when I accidentally uninstalled Docker along with other apps I was intending to uninstall and then reinstalled and set up the whole thing again. Before it was working fine even without the need to pup memory in the command.
Thanks for reporting this, @Tigran01. It's useful to mention this happened when you tried to install again. Can you check the torchserve logs inside the Docker container and see if there's anything useful error messages there?
@hjessmith Weirdly enough, the curl: (56) Recv failure: Connection reset by peer doesn't reproduce atm (will add a comment with logs if it does). However, when trying to get an animation, I encounter the "Failed to get bounding box, please check if the 'docker_torchserve' is running and healthy <Response: [503]>". Btw, this was when indeed the docker_torchserver was running healthy, I had checked it before and after. Later on, the status became unhealthy, below I attached the logs for both when the error occured, and of unhealthy state.
ERROR "Failed to get bounding box, please check if the 'docker_torchserve' is running and healthy <Response: [503]>":
2024-06-25T17:56:36,136 [WARN ] W-9013-drawn_humanoid_detector_1.0-stderr MODEL_LOG - from torchvision import datasets, io, models, ops, transforms, utils
2024-06-25T17:56:36,136 [WARN ] W-9013-drawn_humanoid_detector_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.11/site-packages/torchvision/models/init.py", line 17, in
UNHEALTHY:88 2024-06-25T18:00:08,991 [INFO ] pool-2-thread-20 ACCESS_LOG - /192.168.65.1:46332 "GET /ping HTTP/1.1" 500 30 2024-06-25T18:00:08,996 [INFO ] pool-2-thread-20 TS_METRICS - Requests5XX.Count:1.0|#Level:Host|#hostname:84dd413809ef,timestamp:1719338408
@hjessmith Not sure if this would help (I am not really that experienced in how this works), my previous set up of AnimatedDrawings (before I had uninstalled Docker) was before the 461fe94825d189aca98f34f8085f3c724cf7be2f commit. I bumped into this issue and discovered then it was fixed and pulled again.
The other difference with this run was the extreme CPU usage, where my computer started so slow down and heat up (I used the same computer previously), and also how quickly the memory was filling up this time. Hopefully, it help. Please, let me know if there's something I can do that would be more helpful.
What platform are you trying to run Animated Drawings on? A local machine or something on the cloud? One option is to try avoiding Docker entirely. There's instructions for how to do this with macos on the main readme. Would something like that work for you?
I am running the docker on the local MacOS, but running the actual scripts in the local VM Linux via bridged network. Using Linux, since I am looking for headless rendering. Unfortunately, I can't run docker in the Linux as well due to the architectural limitations of VM machine. Will take a look at the MacOS instructions now, the thing is I specifically want the headless rendering.
@hjessmith Happy to say, that I was finally able to fix the issue. The high CPU usage decreased significantly, when I downgraded the Docker (it's probably specific to my computer), but even then the Failed to get bounding box, please check if the 'docker_torchserve' is running and healthy <Response: [503]>" error was appearing, because of numpy version. From the log glimpse, it seemed like the torchvision==0.15.1 was installing the latest numpy as dependency and therefore the one in setup file was getting overwritten.
I added RUN pip install numpy==1.23.3
(I guess it could also be 1.24.4 as in setup file, but I had previously this version so just sticked with it) in the Dockerfile after the RUN pip install torchvision==0.15.1 # solve torch version problem
and it worked just fine.
@hjessmith Confirmed that it's working RUN pip install numpy==1.24.4
, and doesn't without it. Can make a pull request if you'd like.
That would be great. I'll happily merge that PR.
I keep getting curl: (56) Recv failure: Connection reset by peer.
Changed the memory/CPU in Docker, but still encounter it. Any idea what can cause this?