Closed lassoan closed 3 years ago
The server stopped responding again. This time it has not recovered by itself. See logs here: https://pastebin.com/zsuqATe7
Stopping and starting the server fixed the issue.
These outages and the need for manual restarting of the server from time to time are quite inconvenient. Do you have any recommendation on how to fix this?
@YuanTingHsieh can you take a look on this issue?
@SachidanandAlle Thanks for tagging me.
@lassoan Thanks for the report, the AIAA server now do have a bug that is only related to the implementation of how AIAA utilize "grpc" communication protocol with the Triton server. This bug only happens when Triton server is restarting / not responding at a certain moment.
A quick workaround on your side is, if you are using the docker-compose to start it, you can just modify the variable (triton protocol)
from TRITON_PROTO=grpc
to TRITON_PROTO=http
in your docker-compose.env
file.
We will including a few bug fixes in the next release (4.1), which I will make sure this is fixed.
Thanks a lot for your help, I've changed the protocol to http. I'll close the issue now (and reopen in case it occurs again).
In every few days, the AIAA server stops responding to model requests (http://perklabseg.asuscomm.com:5000/v1/models times out), while the server API tester (http://perklabseg.asuscomm.com:5000/) and logs (http://perklabseg.asuscomm.com:5000/logs/) work OK. It recovers by itself after about 5-10 minutes. See the logs here: https://pastebin.com/9kpDn9WW (search for
>>>>>
to see where the error happened and when the server recovered).