ECR images outdated - Githubissues

Describe the bug The images stored on ECR seem outdated. I would like to use the SAGEMAKER_GUNICORN_WORKERS feature from 1.8. I deploy my sagemaker instance on eu-west-1 with image 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.1.0-gpu. In my sagemaker endpoint logs, I do not see multiple gunicorn workers starting. I also see the gunicorn startup command INFO:__main__:gunicorn command: gunicorn -b unix:/tmp/gunicorn.sock -k gevent --chdir /sagemaker --pythonpath /opt/ml/model/code -e TFS_GRPC_PORT=12000 -e SAGEMAKER_MULTI_MODEL=False python_service:app, which looks different from the command in 1.8. I also see little CPU activity on my machine, while making many requests.

To reproduce

Deploy a sagemaker tensorflow endpoint with image 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.1.0-gpu and some python handler.
Check the logs in cloudwatch
Verify that these logs do not come from the 1.8 code.

Expected behavior

Multiple gunicorn workers are spawned, and that is shown in the logs
The gunicorn command shown in the logs corresponds to the 1.8 code
The sagemaker instance is able to fully utilize CPU.

Screenshots or logs


INFO:__main__:starting services
--
INFO:__main__:using default model name: model
INFO:__main__:tensorflow serving model config:
model_config_list: {  config: {    name: "model",    base_path: "/opt/ml/model",    model_platform: "tensorflow"  }
}
INFO:__main__:nginx config:
load_module modules/ngx_http_js_module.so;
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr error;
worker_rlimit_nofile 4096;
events {  worker_connections 2048;
}
http {  include /etc/nginx/mime.types;  default_type application/json;  access_log /dev/stdout combined;  js_include tensorflow-serving.js;  upstream tfs_upstream {    server localhost:12001;  }  upstream gunicorn_upstream {    server unix:/tmp/gunicorn.sock fail_timeout=1;  }  server {    listen 8080 deferred;    client_max_body_size 0;    client_body_buffer_size 100m;    subrequest_output_buffer_size 100m;    set $tfs_version 2.1;    set $default_tfs_model model;    location /tfs {        rewrite ^/tfs/(.*) /$1  break;        proxy_redirect off;        proxy_pass_request_headers off;        proxy_set_header Content-Type 'application/json';        proxy_set_header Accept 'application/json';        proxy_pass http://tfs_upstream;    }    location /ping {        js_content ping_without_model;    }    location /invocations {        proxy_pass http://gunicorn_upstream/invocations;    }    location ~ ^/models/(.*)/invoke {        js_content invocations;    }    location /models {        proxy_pass http://gunicorn_upstream/models;    }    location / {        return 404 '{"error": "Not Found"}';    }    keepalive_timeout 3;  }
}
INFO:__main__:batching is enabled
WARNING:__main__:num_batch_threads defaulted to 8,the number of CPUs. Set SAGEMAKER_TFS_NUM_BATCH_THREADS to override default.
max_enqueued_batches defaulted to 8. Set SAGEMAKER_TFS_MAX_ENQUEUED_BATCHES to override default. Tuning this parameter may be necessary to tune out-of-memory errors occur.
INFO:__main__:batching config:
max_batch_size { value: 32 }
batch_timeout_micros { value: 1000000 }
num_batch_threads { value: 8 }
max_enqueued_batches { value: 8 }
INFO:__main__:tensorflow version info:
TensorFlow ModelServer: 2.1.0-rc1+dev.sha.56ab43f
TensorFlow Library: 2.1.0
INFO:__main__:tensorflow serving command: tensorflow_model_server --port=12000 --rest_api_port=12001 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0 --enable_batching=true --batching_parameters_file=/sagemaker/batching-config.cfg
INFO:__main__:started tensorflow serving (pid: 15)
INFO:__main__:installing packages from requirements.txt...
2021-04-18 09:17:16.694250: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2021-04-18 09:17:16.694301: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: model
2021-04-18 09:17:16.794528: I tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources for servable: {name: model version: 0} exhausted max_num_retries: 0
2021-04-18 09:17:16.794552: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 0}
2021-04-18 09:17:16.794559: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 0}
2021-04-18 09:17:16.794566: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 0}
2021-04-18 09:17:16.794586: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/0
2021-04-18 09:17:17.191099: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-04-18 09:17:17.191150: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/0
2021-04-18 09:17:17.192358: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2021-04-18 09:17:17.195306: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-04-18 09:17:17.459224: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.459806: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-04-18 09:17:17.459818: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2021-04-18 09:17:17.459866: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.460440: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.460972: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
Collecting numpy  Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
Collecting opencv-python-headless  Downloading opencv_python_headless-4.5.1.48-cp36-cp36m-manylinux2014_x86_64.whl (37.6 MB)
Installing collected packages: numpy, opencv-python-headless
2021-04-18 09:17:19.843425: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-18 09:17:19.843448: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-04-18 09:17:19.843454: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-04-18 09:17:19.845196: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.845770: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.846306: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.846833: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14125 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2021-04-18 09:17:20.610839: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "BoxIntersectionOverUnion" device_type: "GPU"') for unknown op: BoxIntersectionOverUnion
2021-04-18 09:17:20.610880: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "BoxEncode" device_type: "GPU"') for unknown op: BoxEncode
2021-04-18 09:17:20.882372: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
Successfully installed numpy-1.19.5 opencv-python-headless-4.5.1.48
WARNING: You are using pip version 20.0.2; however, version 21.0.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
INFO:__main__:gunicorn command: gunicorn -b unix:/tmp/gunicorn.sock -k gevent --chdir /sagemaker --pythonpath /opt/ml/model/code -e TFS_GRPC_PORT=12000 -e SAGEMAKER_MULTI_MODEL=False python_service:app
INFO:__main__:gunicorn version info:
gunicorn (version 20.0.4)
INFO:__main__:started gunicorn (pid: 89)
[2021-04-18 09:17:22 +0000] [89] [INFO] Starting gunicorn 20.0.4
[2021-04-18 09:17:22 +0000] [89] [INFO] Listening at: unix:/tmp/gunicorn.sock (89)
INFO:__main__:gunicorn server is ready!
[2021-04-18 09:17:22 +0000] [89] [INFO] Using worker: gevent
[2021-04-18 09:17:22 +0000] [93] [INFO] Booting worker with pid: 93
INFO:__main__:nginx version info:
nginx version: nginx/1.18.0
built by gcc 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
built with OpenSSL 1.1.1  11 Sep 2018
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.18.0/debian/debuild-base/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
INFO:__main__:started nginx (pid: 94)
2021-04-18 09:17:23.383629: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /opt/ml/model/0
2021-04-18 09:17:24.514820: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 7720231 microseconds.
2021-04-18 09:17:24.763369: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:169] Wrapping session to perform batch processing
2021-04-18 09:17:24.763420: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:153] Wrapping session to perform batch processing
2021-04-18 09:17:24.764123: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /opt/ml/model/0/assets.extra/tf_serving_warmup_requests
2021-04-18 09:17:24.772618: I tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable: {name: model version: 0} exhausted max_num_retries: 0
2021-04-18 09:17:24.772644: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: model version: 0}
2021-04-18 09:17:24.778584: I tensorflow_serving/model_servers/server.cc:362] Running gRPC ModelServer at 0.0.0.0:12000 ...
[warn] getaddrinfo: address family for nodename not supported
2021-04-18 09:17:24.782789: I tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST API at:localhost:12001 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

System information A description of your system. Please provide:

Toolkit version:
Framework version: 2.1.0 (also happens in 2.0.0)
Python version:
CPU or GPU: GPU
Custom Docker image (Y/N): N

Additional context Add any other context about the problem here.

aws / sagemaker-tensorflow-serving-container

ECR images outdated #206