Describe the bug
tensorflow serving throw following error while enable multiple TFS instances through SAGEMAKER_TFS_INSTANCE_COUNT env:
INFO:__main__:tensorflow version info:
TensorFlow ModelServer: 2.8.3-rc1+dev.sha.no_git
TensorFlow Library: 2.8.3
INFO:__main__:tensorflow serving command: tensorflow_model_server --port=9000 --rest_api_port=8501 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0 --per_process_gpu_memory_fraction=0.2667
INFO:__main__:started tensorflow serving (pid: 26)
Traceback (most recent call last):
File "/sagemaker/serve.py", line 502, in <module>
ServiceManager().start()
File "/sagemaker/serve.py", line 483, in start
self._start_tfs()
File "/sagemaker/serve.py", line 326, in _start_tfs
p = self._start_single_tfs(i)
File "/sagemaker/serve.py", line 420, in _start_single_tfs
self._tfs_grpc_ports[instance_id],
IndexError: list index out of range
To reproduce
from sagemaker.tensorflow.serving import TensorFlowModel
model_local_batch = TensorFlowModel(
source_dir='sm-code-pb', entry_point='inference.py',
model_data=model_data,
role=role,
framework_version='2.8',
env = {
'SAGEMAKER_TFS_INSTANCE_COUNT': '3', # number of TFS instances, 3 is good for 16G GPU mem
}
)
instance_type = 'local_gpu' # 'local' for CPU instance
predictor_local_batch = model_local_batch.deploy(initial_instance_count=1, instance_type=instance_type)
if SAGEMAKER_SAFE_PORT_RANGE is also passed into env, issue solved.
Expected behavior
Enable multiple TFS instances without passing SAGEMAKER_SAFE_PORT_RANGE manually.
Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.
System information
A description of your system. Please provide:
Describe the bug tensorflow serving throw following error while enable multiple TFS instances through SAGEMAKER_TFS_INSTANCE_COUNT env:
To reproduce
if SAGEMAKER_SAFE_PORT_RANGE is also passed into env, issue solved.
Expected behavior Enable multiple TFS instances without passing SAGEMAKER_SAFE_PORT_RANGE manually.
Screenshots or logs If applicable, add screenshots or logs to help explain your problem.
System information A description of your system. Please provide:
Additional context Add any other context about the problem here.