aws / sagemaker-tensorflow-serving-container

A TensorFlow Serving solution for use in SageMaker. This repo is now deprecated.
Apache License 2.0
174 stars 101 forks source link

ECR images outdated #206

Open agoblet opened 3 years ago

agoblet commented 3 years ago

Describe the bug The images stored on ECR seem outdated. I would like to use the SAGEMAKER_GUNICORN_WORKERS feature from 1.8. I deploy my sagemaker instance on eu-west-1 with image 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.1.0-gpu. In my sagemaker endpoint logs, I do not see multiple gunicorn workers starting. I also see the gunicorn startup command INFO:__main__:gunicorn command: gunicorn -b unix:/tmp/gunicorn.sock -k gevent --chdir /sagemaker --pythonpath /opt/ml/model/code -e TFS_GRPC_PORT=12000 -e SAGEMAKER_MULTI_MODEL=False python_service:app, which looks different from the command in 1.8. I also see little CPU activity on my machine, while making many requests.

To reproduce

Expected behavior

Screenshots or logs


INFO:__main__:starting services
--
INFO:__main__:using default model name: model
INFO:__main__:tensorflow serving model config:
model_config_list: {  config: {    name: "model",    base_path: "/opt/ml/model",    model_platform: "tensorflow"  }
}
INFO:__main__:nginx config:
load_module modules/ngx_http_js_module.so;
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr error;
worker_rlimit_nofile 4096;
events {  worker_connections 2048;
}
http {  include /etc/nginx/mime.types;  default_type application/json;  access_log /dev/stdout combined;  js_include tensorflow-serving.js;  upstream tfs_upstream {    server localhost:12001;  }  upstream gunicorn_upstream {    server unix:/tmp/gunicorn.sock fail_timeout=1;  }  server {    listen 8080 deferred;    client_max_body_size 0;    client_body_buffer_size 100m;    subrequest_output_buffer_size 100m;    set $tfs_version 2.1;    set $default_tfs_model model;    location /tfs {        rewrite ^/tfs/(.*) /$1  break;        proxy_redirect off;        proxy_pass_request_headers off;        proxy_set_header Content-Type 'application/json';        proxy_set_header Accept 'application/json';        proxy_pass http://tfs_upstream;    }    location /ping {        js_content ping_without_model;    }    location /invocations {        proxy_pass http://gunicorn_upstream/invocations;    }    location ~ ^/models/(.*)/invoke {        js_content invocations;    }    location /models {        proxy_pass http://gunicorn_upstream/models;    }    location / {        return 404 '{"error": "Not Found"}';    }    keepalive_timeout 3;  }
}
INFO:__main__:batching is enabled
WARNING:__main__:num_batch_threads defaulted to 8,the number of CPUs. Set SAGEMAKER_TFS_NUM_BATCH_THREADS to override default.
max_enqueued_batches defaulted to 8. Set SAGEMAKER_TFS_MAX_ENQUEUED_BATCHES to override default. Tuning this parameter may be necessary to tune out-of-memory errors occur.
INFO:__main__:batching config:
max_batch_size { value: 32 }
batch_timeout_micros { value: 1000000 }
num_batch_threads { value: 8 }
max_enqueued_batches { value: 8 }
INFO:__main__:tensorflow version info:
TensorFlow ModelServer: 2.1.0-rc1+dev.sha.56ab43f
TensorFlow Library: 2.1.0
INFO:__main__:tensorflow serving command: tensorflow_model_server --port=12000 --rest_api_port=12001 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0 --enable_batching=true --batching_parameters_file=/sagemaker/batching-config.cfg
INFO:__main__:started tensorflow serving (pid: 15)
INFO:__main__:installing packages from requirements.txt...
2021-04-18 09:17:16.694250: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2021-04-18 09:17:16.694301: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: model
2021-04-18 09:17:16.794528: I tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources for servable: {name: model version: 0} exhausted max_num_retries: 0
2021-04-18 09:17:16.794552: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 0}
2021-04-18 09:17:16.794559: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 0}
2021-04-18 09:17:16.794566: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 0}
2021-04-18 09:17:16.794586: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/0
2021-04-18 09:17:17.191099: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-04-18 09:17:17.191150: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/0
2021-04-18 09:17:17.192358: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2021-04-18 09:17:17.195306: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-04-18 09:17:17.459224: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.459806: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-04-18 09:17:17.459818: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2021-04-18 09:17:17.459866: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.460440: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.460972: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
Collecting numpy  Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
Collecting opencv-python-headless  Downloading opencv_python_headless-4.5.1.48-cp36-cp36m-manylinux2014_x86_64.whl (37.6 MB)
Installing collected packages: numpy, opencv-python-headless
2021-04-18 09:17:19.843425: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-18 09:17:19.843448: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-04-18 09:17:19.843454: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-04-18 09:17:19.845196: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.845770: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.846306: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.846833: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14125 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2021-04-18 09:17:20.610839: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "BoxIntersectionOverUnion" device_type: "GPU"') for unknown op: BoxIntersectionOverUnion
2021-04-18 09:17:20.610880: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "BoxEncode" device_type: "GPU"') for unknown op: BoxEncode
2021-04-18 09:17:20.882372: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
Successfully installed numpy-1.19.5 opencv-python-headless-4.5.1.48
WARNING: You are using pip version 20.0.2; however, version 21.0.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
INFO:__main__:gunicorn command: gunicorn -b unix:/tmp/gunicorn.sock -k gevent --chdir /sagemaker --pythonpath /opt/ml/model/code -e TFS_GRPC_PORT=12000 -e SAGEMAKER_MULTI_MODEL=False python_service:app
INFO:__main__:gunicorn version info:
gunicorn (version 20.0.4)
INFO:__main__:started gunicorn (pid: 89)
[2021-04-18 09:17:22 +0000] [89] [INFO] Starting gunicorn 20.0.4
[2021-04-18 09:17:22 +0000] [89] [INFO] Listening at: unix:/tmp/gunicorn.sock (89)
INFO:__main__:gunicorn server is ready!
[2021-04-18 09:17:22 +0000] [89] [INFO] Using worker: gevent
[2021-04-18 09:17:22 +0000] [93] [INFO] Booting worker with pid: 93
INFO:__main__:nginx version info:
nginx version: nginx/1.18.0
built by gcc 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
built with OpenSSL 1.1.1  11 Sep 2018
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.18.0/debian/debuild-base/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
INFO:__main__:started nginx (pid: 94)
2021-04-18 09:17:23.383629: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /opt/ml/model/0
2021-04-18 09:17:24.514820: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 7720231 microseconds.
2021-04-18 09:17:24.763369: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:169] Wrapping session to perform batch processing
2021-04-18 09:17:24.763420: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:153] Wrapping session to perform batch processing
2021-04-18 09:17:24.764123: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /opt/ml/model/0/assets.extra/tf_serving_warmup_requests
2021-04-18 09:17:24.772618: I tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable: {name: model version: 0} exhausted max_num_retries: 0
2021-04-18 09:17:24.772644: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: model version: 0}
2021-04-18 09:17:24.778584: I tensorflow_serving/model_servers/server.cc:362] Running gRPC ModelServer at 0.0.0.0:12000 ...
[warn] getaddrinfo: address family for nodename not supported
2021-04-18 09:17:24.782789: I tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST API at:localhost:12001 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

System information A description of your system. Please provide:

Additional context Add any other context about the problem here.

agoblet commented 3 years ago

For now I have worked around this issue using gunicorn's WEB_CONCURRENCY environment variable. With that set I can see more workers spawning in the logs, and my CPU utilization is also better.