awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference
Apache License 2.0
998 stars 230 forks source link

Scaling down workers to 0 and back up again throws exception #916

Closed maaquib closed 2 years ago

maaquib commented 4 years ago

The handling for a DELETE request should be different from min_worker=0 request. Right now, both of these shutdown the backend server. Scaling down workers to min_worker=0 and back up again throws exception

2020-05-18 22:50:59,674 [ERROR] W-9000-model_test_001 com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker error
com.amazonaws.ml.mms.wlm.WorkerInitializationException: Failed to connect to worker.
    at com.amazonaws.ml.mms.wlm.WorkerThread.connect(WorkerThread.java:355)
    at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:207)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:connect(..) failed: Connection refused: /home/model-server/tmp/.mms.sock.9000
    at io.netty.channel.unix.Socket.connect(..)(Unknown Source)
Caused by: io.netty.channel.unix.Errors$NativeConnectException: syscall:connect(..) failed: Connection refused
    ... 1 more

Steps for reproducing

maaquib commented 4 years ago

Duplicate of https://github.com/awslabs/multi-model-server/issues/895