awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference
Apache License 2.0
998 stars 230 forks source link

ONNX to .MAR converter test case fails. #936

Open quantum-fusion opened 4 years ago

quantum-fusion commented 4 years ago

I executed the Create a .mar file from onnx model, from (https://github.com/awslabs/multi-model-server/blob/master/model-archiver/docs/convert_from_onnx.md), however it produces errors, and the curl test case fails. (curl -X POST http://127.0.0.1:8080/predictions/squeezenet -T kitten.jpg). It produces no result, because the server is producing exceptions

The model file in this example contains.onnx extension.

In order to convert the model with .onnx extension to an MXNet model, we would need to use the -c option of the model-archiver tool.

Now you can use the model-archiver command to output onnx-squeezenet.mar file.

cd multi-model-server/examples

model-archiver --model-name onnx-squeezenet --model-path onnx-squeezenet --handler mxnet_vision_service:handle -c -f Now start the server:

cd multi-model-server

multi-model-server --start --model-store examples --models squeezenet=onnx-squeezenet.mar

% multi-model-server --start --model-store . --models squeezenet=onnx-squeezenet.mar

(base) (venv) MacBook-Pro:~/Keras quantum-fusion$ 2020-08-05 05:42:57,705 [INFO ] main com.amazonaws.ml.mms.ModelServer - MMS Home: /Users/h/venv/lib/python3.8/site-packages Current directory: /Users/h/keras Temp directory: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T/ Number of GPUs: 0 Number of CPUs: 16 Max heap size: 8192 M Python executable: /Users/h/venv/bin/python3 Config file: N/A Inference address: http://127.0.0.1:8080 Management address: http://127.0.0.1:8081 Model Store: /Users/h/keras Initial Models: squeezenet=onnx-squeezenet.mar Log dir: /Users/h/keras/logs Metrics dir: /Users/h/keras/logs Netty threads: 0 Netty client threads: 0 Default workers per model: 16 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Preload model: false Prefer direct buffer: false 2020-08-05 05:42:57,708 [INFO ] main com.amazonaws.ml.mms.ModelServer - Loading initial models: onnx-squeezenet.mar preload_model: false 2020-08-05 05:42:57,910 [WARN ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-squeezenet 2020-08-05 05:42:57,999 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 --handler mxnet_vision_service:handle --model-path /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T/models/7f9abc4605fa6eceda10d13af1c9fe377c7513b0 --model-name squeezenet --preload-model false --tmp-dir /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T/ 2020-08-05 05:42:57,999 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:57,999 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 12699 2020-08-05 05:42:58,000 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started. 2020-08-05 05:42:58,000 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.8.3 2020-08-05 05:42:58,000 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_MODEL_LOADED 2020-08-05 05:42:58,000 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model squeezenet loaded. 2020-08-05 05:42:58,001 [DEBUG] main com.amazonaws.ml.mms.wlm.ModelManager - updateModel: squeezenet, count: 16 2020-08-05 05:42:58,000 [INFO ] W-9000-squeezenet MMS_METRICS - W-9000-squeezenet.ms:103|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,001 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,002 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,002 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,002 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,002 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,002 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,003 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,003 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,003 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,003 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,004 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,004 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,004 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,004 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,004 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,004 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change null -> WORKER_STARTED 2020-08-05 05:42:58,007 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: KQueueServerSocketChannel. 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,013 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000 2020-08-05 05:42:58,074 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://127.0.0.1:8080 2020-08-05 05:42:58,074 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Management server with: KQueueServerSocketChannel. 2020-08-05 05:42:58,076 [INFO ] main com.amazonaws.ml.mms.ModelServer - Management API bind to: http://127.0.0.1:8081 Model server started. 2020-08-05 05:42:58,078 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,086 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet. 2020-08-05 05:42:58,089 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,092 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,095 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,098 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,102 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,104 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,108 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,116 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,125 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,131 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,137 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,143 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,152 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,156 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,177 [INFO ] W-9000-squeezenet-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /var/folders/60/l9x5s_456sx9nspbd1l90_fm0000gn/T//.mms.sock.9000. 2020-08-05 05:42:58,274 [INFO ] pool-2-thread-1 MMS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,274 [INFO ] pool-2-thread-1 MMS_METRICS - DiskAvailable.Gigabytes:3220.6083984375|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,274 [INFO ] pool-2-thread-1 MMS_METRICS - DiskUsage.Gigabytes:10.480144500732422|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,275 [INFO ] pool-2-thread-1 MMS_METRICS - DiskUtilization.Percent:0.3|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,275 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryAvailable.Megabytes:12688.4140625|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,275 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUsed.Megabytes:18482.9296875|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,275 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:61.3|#Level:Host|#hostname:Henrys-MacBook-Pro-2.local,timestamp:1596620578 2020-08-05 05:42:58,325 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ERROR:root:Backend worker process died. 2020-08-05 05:42:58,325 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last): 2020-08-05 05:42:58,325 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_service_worker.py", line 174, in start_worker 2020-08-05 05:42:58,325 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.handle_connection(cl_socket) 2020-08-05 05:42:58,325 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_service_worker.py", line 143, in handle_connection 2020-08-05 0:42:58,325 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - result, code = self.load_model(msg) 2020-08-05 05:42:58,326 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_service_worker.py", line 109, in load_model 2020-08-05 05:42:58,326 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.service = self.model_loader.load(model_name, model_dir, handler, gpu, batch_size) 2020-08-05 05:42:58,326 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_loader.py", line 116, in load 2020-08-05 05:42:58,326 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.module = importlib.import_module(module_name) 2020-08-05 05:42:58,326 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/importlib/init.py", line 127, in import_module 2020-08-05 05:42:58,326 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - return _bootstrap._gcd_import(name[level:], package, level) 2020-08-05 05:42:58,326 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "", line 1014, in _gcd_import 2020-08-05 05:42:58,327 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "", line 991, in _find_and_load 2020-08-05 05:42:58,327 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "", line 973, in _find_and_load_unlocked 2020-08-05 05:42:58,327 [INFO ] KQueueEventLoopGroup-4-8 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-1154837e Worker disconnected. WORKER_STARTED 2020-08-05 05:42:58,327 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ModuleNotFoundError: No module named 'mxnet_vision_service' 2020-08-05 05:42:58,328 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1668) at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) at com.amazonaws.ml.mms.wlm.WorkerThread.runWorker(WorkerThread.java:148) at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:211) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at java.base/java.lang.Thread.run(Thread.java:832) 2020-08-05 05:42:58,330 [WARN ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: squeezenet, error: Worker died. 2020-08-05 05:42:58,330 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change WORKER_STARTED -> WORKER_STOPPED 2020-08-05 05:42:58,331 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Retry worker: 9000-1154837e in 1 seconds. 2020-08-05 05:42:58,334 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ERROR:root:Backend worker process died. 2020-08-05 05:42:58,335 [INFO ] KQueueEventLoopGroup-4-11 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-74d81d8c Worker disconnected. WORKER_STARTED 2020-08-05 05:42:58,335 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last): 2020-08-05 05:42:58,335 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1668) at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) at com.amazonaws.ml.mms.wlm.WorkerThread.runWorker(WorkerThread.java:148) at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:211) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at java.base/java.lang.Thread.run(Thread.java:832) 2020-08-05 05:42:58,335 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_service_worker.py", line 174, in start_worker 2020-08-05 05:42:58,335 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.handle_connection(cl_socket) 2020-08-05 05:42:58,335 [WARN ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: squeezenet, error: Worker died. 2020-08-05 05:42:58,336 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_service_worker.py", line 143, in handle_connection 2020-08-05 05:42:58,336 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-squeezenet State change WORKER_STARTED -> WORKER_STOPPED 2020-08-05 05:42:58,336 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - result, code = self.load_model(msg) 2020-08-05 05:42:58,336 [INFO ] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Retry worker: 9000-74d81d8c in 1 seconds. 2020-08-05 05:42:58,336 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_service_worker.py", line 109, in load_model 2020-08-05 05:42:58,336 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.service = self.model_loader.load(model_name, model_dir, handler, gpu, batch_size) 2020-08-05 05:42:58,336 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/Users/h/venv/lib/python3.8/site-packages/mms/model_loader.py", line 116, in load 2020-08-05 05:42:58,336 [INFO ] KQueueEventLoopGroup-4-6 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-e99a1472 Worker disconnected. WORKER_STARTED 2020-08-05 05:42:58,336 [WARN ] W-9000-squeezenet-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.module = importlib.import_module(module_name) 2020-08-05 05:42:58,336 [DEBUG] W-9000-squeezenet com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1668) at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) at com.amazonaws.ml.mms.wlm.WorkerThread.runWorker(WorkerThread.java:148) at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:211) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at java.base/java.lang.Thread.run(Thread.java:832)

maaquib commented 4 years ago

Can you provide more details about your multi-model-server setup?

quantum-fusion commented 4 years ago

I discovered that if I have a custom ONNX file that I created, but do not have the param and json file, that the MAR converter does not work.

model-archiver --model-name onnx-squeezenet --model-path onnx-squeezenet --handler mxnet_vision_service:handle -c -f I have a file called model.onnx , but do not have the sim or params files. Do you know where to get these files or how to generate them? The archiver appears to erroneously generate the .MAR file, but there are errors when the server runs.

sym = './model.json' params = './model.params'

quantum-fusion commented 4 years ago

The setup is just like (https://github.com/awslabs/multi-model-server), it is just if I download the .ONNX from the modelZoo, and use the converter, like I would in real life, that it fails.

model-archiver --model-name onnx-squeezenet --model-path onnx-squeezenet --handler mxnet_vision_service:handle -c -f

quantum-fusion commented 4 years ago

See squeeze net model from model zoo (https://github.com/onnx/models/tree/master/vision/classification/squeezenet/model)

quantum-fusion commented 4 years ago

I can not assume that I have a .MAR file, and need to use the converter for .ONNX formatted models from the Model Zoo.

quantum-fusion commented 4 years ago

@maaquib Can you please help tell me how to generate the sym and params files, and do you have an exporter python script that you can provide for a custom ONNX model? I have been using the Microsoft backed https://github.com/onnx/tensorflow-onnx It only produces an ONNX model, and not the sym and params files for use with your multi-model-server.

Please advise.