aws / sagemaker-tensorflow-serving-container

A TensorFlow Serving solution for use in SageMaker. This repo is now deprecated.
Apache License 2.0
174 stars 101 forks source link

IndexError: list index out of range #229

Open Chen188 opened 1 year ago

Chen188 commented 1 year ago

Describe the bug tensorflow serving throw following error while enable multiple TFS instances through SAGEMAKER_TFS_INSTANCE_COUNT env:

INFO:__main__:tensorflow version info:
TensorFlow ModelServer: 2.8.3-rc1+dev.sha.no_git
TensorFlow Library: 2.8.3
INFO:__main__:tensorflow serving command: tensorflow_model_server --port=9000 --rest_api_port=8501 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0    --per_process_gpu_memory_fraction=0.2667
INFO:__main__:started tensorflow serving (pid: 26)
Traceback (most recent call last):
  File "/sagemaker/serve.py", line 502, in <module>
    ServiceManager().start()
  File "/sagemaker/serve.py", line 483, in start
    self._start_tfs()
  File "/sagemaker/serve.py", line 326, in _start_tfs
    p = self._start_single_tfs(i)
  File "/sagemaker/serve.py", line 420, in _start_single_tfs
    self._tfs_grpc_ports[instance_id],
IndexError: list index out of range

To reproduce

from sagemaker.tensorflow.serving import TensorFlowModel

model_local_batch = TensorFlowModel(
    source_dir='sm-code-pb', entry_point='inference.py',
    model_data=model_data,
    role=role,
    framework_version='2.8',
    env = {
        'SAGEMAKER_TFS_INSTANCE_COUNT': '3',         # number of TFS instances, 3 is good for 16G GPU mem
    }
)
instance_type = 'local_gpu' # 'local' for CPU instance

predictor_local_batch = model_local_batch.deploy(initial_instance_count=1, instance_type=instance_type)

if SAGEMAKER_SAFE_PORT_RANGE is also passed into env, issue solved.

Expected behavior Enable multiple TFS instances without passing SAGEMAKER_SAFE_PORT_RANGE manually.

Screenshots or logs If applicable, add screenshots or logs to help explain your problem.

System information A description of your system. Please provide:

Additional context Add any other context about the problem here.