aws / sagemaker-huggingface-inference-toolkit

Apache License 2.0
238 stars 60 forks source link

logs not showing up in cloudwatch #61

Open ibegleris opened 2 years ago

ibegleris commented 2 years ago

When deploying a question answering model with later versions of python, transformers and pytorch the logs do not seem to be seen in cloudwatch. If I deploy the same model but with older python and module versions the logs can be seen in cloudwatch.

Recreate the issue

Case 1, No logs seen

Code


hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad',
  'HF_TASK':'question-answering'
}

huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   transformers_version="4.12.3",
   pytorch_version="1.9.1", 
   py_version="py38",
)
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

data = {
"inputs": {
    "question": "What is used for inference?",
    "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
    }
}

# request
predictor.predict(data)

logs:


#015Downloading:   0%\|          \| 0.00/2.74k [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 2.74k/2.74k [00:00<00:00, 3.36MB/s]
--
#015Downloading:   0%\|          \| 0.00/451 [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 451/451 [00:00<00:00, 387kB/s]
#015Downloading:   0%\|          \| 0.00/265M [00:00<?, ?B/s]#015Downloading:   3%\|▎         \| 8.39M/265M [00:00<00:03, 83.9MB/s]#015Downloading:   6%\|▋         \| 17.2M/265M [00:00<00:02, 86.1MB/s]#015Downloading:  10%\|▉         \| 25.8M/265M [00:00<00:03, 79.6MB/s]#015Downloading:  13%\|█▎        \| 34.0M/265M [00:00<00:02, 80.7MB/s]#015Downloading:  16%\|█▌        \| 42.1M/265M [00:00<00:02, 74.5MB/s]#015Downloading:  19%\|█▊        \| 49.6M/265M [00:00<00:02, 72.9MB/s]#015Downloading:  21%\|██▏       \| 57.0M/265M [00:00<00:03, 68.6MB/s]#015Downloading:  24%\|██▍       \| 63.9M/265M [00:00<00:03, 63.6MB/s]#015Downloading:  27%\|██▋       \| 70.7M/265M [00:00<00:03, 64.9MB/s]#015Downloading:  29%\|██▉       \| 77.5M/265M [00:01<00:02, 65.5MB/s]#015Downloading:  32%\|███▏      \| 84.1M/265M [00:01<00:02, 62.5MB/s]#015Downloading:  34%\|███▍      \| 90.4M/265M [00:01<00:02, 61.2MB/s]#015Downloading:  37%\|███▋      \| 97.4M/265M [00:01<00:02, 63.8MB/s]#015Downloading:  39%\|███▉      \| 104M/265M [00:01<00:02, 63.0MB/s] #015Downloading:  42%\|████▏     \| 110M/265M [00:01<00:02, 60.5MB/s]#015Downloading:  44%\|████▍     \| 116M/265M [00:01<00:02, 58.9MB/s]#015Downloading:  46%\|████▌     \| 123M/265M [00:01<00:02, 60.0MB/s]#015Downloading:  48%\|████▊     \| 129M/265M [00:01<00:02, 60.1MB/s]#015Downloading:  51%\|█████     \| 135M/265M [00:02<00:02, 60.0MB/s]#015Downloading:  53%\|█████▎    \| 141M/265M [00:02<00:02, 60.3MB/s]#015Downloading:  55%\|█████▌    \| 147M/265M [00:02<00:01, 59.7MB/s]#015Downloading:  58%\|█████▊    \| 153M/265M [00:02<00:01, 59.8MB/s]#015Downloading:  60%\|█████▉    \| 159M/265M [00:02<00:01, 58.4MB/s]#015Downloading:  62%\|██████▏   \| 165M/265M [00:02<00:01, 60.3MB/s]#015Downloading:  65%\|██████▍   \| 171M/265M [00:02<00:01, 60.1MB/s]#015Downloading:  67%\|██████▋   \| 177M/265M [00:02<00:01, 59.4MB/s]#015Downloading:  69%\|██████▉   \| 183M/265M [00:02<00:01, 57.9MB/s]#015Downloading:  71%\|███████   \| 189M/265M [00:02<00:01, 56.6MB/s]#015Downloading:  73%\|███████▎  \| 195M/265M [00:03<00:01, 56.9MB/s]#015Downloading:  76%\|███████▌  \| 202M/265M [00:03<00:01, 61.1MB/s]#015Downloading:  79%\|███████▉  \| 210M/265M [00:03<00:00, 66.0MB/s]#015Downloading:  81%\|████████▏ \| 216M/265M [00:03<00:00, 62.2MB/s]#015Downloading:  84%\|████████▍ \| 223M/265M [00:03<00:00, 60.2MB/s]#015Downloading:  86%\|████████▌ \| 229M/265M [00:03<00:00, 58.8MB/s]#015Downloading:  88%\|████████▊ \| 235M/265M [00:03<00:00, 58.3MB/s]#015Downloading:  91%\|█████████ \| 241M/265M [00:03<00:00, 58.9MB/s]#015Downloading:  93%\|█████████▎\| 247M/265M [00:03<00:00, 57.4MB/s]#015Downloading:  95%\|█████████▌\| 252M/265M [00:04<00:00, 56.9MB/s]#015Downloading:  97%\|█████████▋\| 258M/265M [00:04<00:00, 57.9MB/s]#015Downloading: 100%\|█████████▉\| 264M/265M [00:04<00:00, 58.3MB/s]#015Downloading: 100%\|██████████\| 265M/265M [00:04<00:00, 62.1MB/s]
#015Downloading:   0%\|          \| 0.00/466k [00:00<?, ?B/s]#015Downloading:  18%\|█▊        \| 86.0k/466k [00:00<00:00, 567kB/s]#015Downloading:  92%\|█████████▏\| 430k/466k [00:00<00:00, 1.56MB/s]#015Downloading: 100%\|██████████\| 466k/466k [00:00<00:00, 1.52MB/s]
#015Downloading:   0%\|          \| 0.00/28.0 [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 28.0/28.0 [00:00<00:00, 20.7kB/s]
#015Downloading:   0%\|          \| 0.00/232k [00:00<?, ?B/s]#015Downloading:  16%\|█▌        \| 36.9k/232k [00:00<00:00, 242kB/s]#015Downloading:  82%\|████████▏ \| 190k/232k [00:00<00:00, 802kB/s] #015Downloading: 100%\|██████████\| 232k/232k [00:00<00:00, 757kB/s]
WARNING - Overwriting /.sagemaker/mms/models/distilbert-base-uncased-distilled-squad ...
Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport
Model server started.

Case 2, logs visible

Code


hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad',
  'HF_TASK':'question-answering'
}

huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   transformers_version="4.6", 
   pytorch_version="1.7",
   py_version="py36",
)
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

data = {
"inputs": {
    "question": "What is used for inference?",
    "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
    }
}

# request
predictor.predict(data)

logs:


This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var `HF_MODEL_ID`
--
#015Downloading:   0%\|          \| 0.00/2.74k [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 2.74k/2.74k [00:00<00:00, 2.74MB/s]
#015Downloading:   0%\|          \| 0.00/451 [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 451/451 [00:00<00:00, 473kB/s]
#015Downloading:   0%\|          \| 0.00/265M [00:00<?, ?B/s]#015Downloading:   3%\|▎         \| 8.17M/265M [00:00<00:03, 81.7MB/s]#015Downloading:   6%\|▌         \| 16.5M/265M [00:00<00:03, 82.9MB/s]#015Downloading:   9%\|▉         \| 24.8M/265M [00:00<00:03, 75.2MB/s]#015Downloading:  12%\|█▏        \| 32.4M/265M [00:00<00:03, 66.9MB/s]#015Downloading:  15%\|█▍        \| 39.8M/265M [00:00<00:03, 69.1MB/s]#015Downloading:  18%\|█▊        \| 46.8M/265M [00:00<00:03, 66.9MB/s]#015Downloading:  20%\|██        \| 53.6M/265M [00:00<00:03, 64.6MB/s]#015Downloading:  23%\|██▎       \| 60.2M/265M [00:00<00:03, 65.1MB/s]#015Downloading:  25%\|██▌       \| 67.3M/265M [00:00<00:02, 66.7MB/s]#015Downloading:  29%\|██▉       \| 76.5M/265M [00:01<00:02, 74.3MB/s]#015Downloading:  32%\|███▏      \| 85.2M/265M [00:01<00:02, 78.0MB/s]#015Downloading:  35%\|███▌      \| 93.0M/265M [00:01<00:02, 76.5MB/s]#015Downloading:  39%\|███▊      \| 103M/265M [00:01<00:01, 82.1MB/s] #015Downloading:  42%\|████▏     \| 111M/265M [00:01<00:01, 82.6MB/s]#015Downloading:  45%\|████▍     \| 119M/265M [00:01<00:01, 76.1MB/s]#015Downloading:  48%\|████▊     \| 127M/265M [00:01<00:01, 75.6MB/s]#015Downloading:  51%\|█████     \| 135M/265M [00:01<00:01, 76.9MB/s]#015Downloading:  54%\|█████▍    \| 143M/265M [00:01<00:01, 71.3MB/s]#015Downloading:  57%\|█████▋    \| 150M/265M [00:02<00:01, 70.4MB/s]#015Downloading:  59%\|█████▉    \| 157M/265M [00:02<00:01, 68.8MB/s]#015Downloading:  62%\|██████▏   \| 165M/265M [00:02<00:01, 70.2MB/s]#015Downloading:  65%\|██████▍   \| 172M/265M [00:02<00:01, 68.7MB/s]#015Downloading:  67%\|██████▋   \| 179M/265M [00:02<00:01, 64.9MB/s]#015Downloading:  70%\|██████▉   \| 185M/265M [00:02<00:01, 62.4MB/s]#015Downloading:  72%\|███████▏  \| 191M/265M [00:02<00:01, 61.1MB/s]#015Downloading:  75%\|███████▍  \| 198M/265M [00:02<00:01, 63.2MB/s]#015Downloading:  78%\|███████▊  \| 206M/265M [00:02<00:00, 67.1MB/s]#015Downloading:  81%\|████████  \| 214M/265M [00:03<00:00, 71.8MB/s]#015Downloading:  83%\|████████▎ \| 222M/265M [00:03<00:00, 72.3MB/s]#015Downloading:  86%\|████████▌ \| 229M/265M [00:03<00:00, 69.9MB/s]#015Downloading:  89%\|████████▉ \| 236M/265M [00:03<00:00, 71.1MB/s]#015Downloading:  92%\|█████████▏\| 245M/265M [00:03<00:00, 76.8MB/s]#015Downloading:  95%\|█████████▌\| 253M/265M [00:03<00:00, 75.0MB/s]#015Downloading:  98%\|█████████▊\| 261M/265M [00:03<00:00, 69.5MB/s]#015Downloading: 100%\|██████████\| 265M/265M [00:03<00:00, 70.8MB/s]
#015Downloading:   0%\|          \| 0.00/466k [00:00<?, ?B/s]#015Downloading:   6%\|▌         \| 28.7k/466k [00:00<00:02, 191kB/s]#015Downloading:  45%\|████▍     \| 209k/466k [00:00<00:00, 783kB/s] #015Downloading: 100%\|██████████\| 466k/466k [00:00<00:00, 1.23MB/s]
#015Downloading:   0%\|          \| 0.00/28.0 [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 28.0/28.0 [00:00<00:00, 19.5kB/s]
#015Downloading:   0%\|          \| 0.00/232k [00:00<?, ?B/s]#015Downloading:  18%\|█▊        \| 41.0k/232k [00:00<00:00, 273kB/s]#015Downloading:  88%\|████████▊ \| 205k/232k [00:00<00:00, 750kB/s] #015Downloading: 100%\|██████████\| 232k/232k [00:00<00:00, 762kB/s]
WARNING - Overwriting /.sagemaker/mms/models/distilbert-base-uncased-distilled-squad ...
2022-03-23 14:31:16,829 [INFO ] main com.amazonaws.ml.mms.ModelServer -
MMS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3234 M
Python executable: /opt/conda/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2022-03-23 14:31:16,921 [WARN ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-distilbert-base-uncased-d
2022-03-23 14:31:17,049 [INFO ] W-9000-distilbert-base-uncased-d-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/distilbert-base-uncased-distilled-squad --model-name distilbert-base-uncased-distilled-squad --preload-model false --tmp-dir /home/model-server/tmp
2022-03-23 14:31:17,051 [INFO ] W-9000-distilbert-base-uncased-d-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2022-03-23 14:31:17,051 [INFO ] W-9000-distilbert-base-uncased-d-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 35
2022-03-23 14:31:17,051 [INFO ] W-9000-distilbert-base-uncased-d-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2022-03-23 14:31:17,051 [INFO ] W-9000-distilbert-base-uncased-d-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.13
2022-03-23 14:31:17,052 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model distilbert-base-uncased-distilled-squad loaded.
2022-03-23 14:31:17,057 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2022-03-23 14:31:17,074 [INFO ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2022-03-23 14:31:17,151 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
Model server started.
2022-03-23 14:31:17,175 [INFO ] W-9000-distilbert-base-uncased-d-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2022-03-23 14:31:17,176 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2022-03-23 14:31:18,810 [INFO ] W-9000-distilbert-base-uncased-d-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model distilbert-base-uncased-distilled-squad loaded io_fd=daebe6fffed0ff6d-00000017-00000001-e136dd8f47306764-937c7c58
2022-03-23 14:31:18,816 [INFO ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1563
2022-03-23 14:31:18,818 [WARN ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-distilbert-base-uncased-d-1
2022-03-23 14:31:19,265 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 31
2022-03-23 14:31:24,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 0
2022-03-23 14:31:29,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 1
2022-03-23 14:31:34,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 1
2022-03-23 14:31:39,201 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 0
2022-03-23 14:31:44,201 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 0
2022-03-23 14:31:49,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 1
2022-03-23 14:31:54,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 0
2022-03-23 14:31:57,949 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 0.028371810913085938 ms
2022-03-23 14:31:57,949 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 1648045917948.6992 ms
2022-03-23 14:31:57,949 [INFO ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 70
2022-03-23 14:31:57,949 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.07843971252441406 ms
2022-03-23 14:31:57,950 [INFO ] W-9000-distilbert-base-uncased-d ACCESS_LOG - /169.254.178.2:51978 "POST /invocations HTTP/1.1" 200 74
2022-03-23 14:31:59,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 1
2022-03-23 14:32:04,201 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 0
2022-03-23 14:32:09,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 1
2022-03-23 14:32:14,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 1
2022-03-23 14:32:19,202 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 1
2022-03-23 14:32:24,201 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 0
2022-03-23 14:47:01,381 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 0.029802322387695312 ms
2022-03-23 14:47:01,381 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 1648046821381.0054 ms
2022-03-23 14:47:01,381 [INFO ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 61
2022-03-23 14:47:01,381 [INFO ] W-9000-distilbert-base-uncased-d ACCESS_LOG - /169.254.178.2:51978 "POST /invocations HTTP/1.1" 200 62
2022-03-23 14:47:01,381 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.08940696716308594 ms
2022-03-23 14:47:02,644 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 0.027418136596679688 ms
2022-03-23 14:47:02,644 [INFO ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 61
2022-03-23 14:47:02,644 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 1648046822644.274 ms
2022-03-23 14:47:02,645 [INFO ] W-9000-distilbert-base-uncased-d ACCESS_LOG - /169.254.178.2:51978 "POST /invocations HTTP/1.1" 200 62
2022-03-23 14:47:02,645 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.08702278137207031 ms
2022-03-23 14:47:03,714 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 0.02288818359375 ms
2022-03-23 14:47:03,715 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 1648046823714.5115 ms
2022-03-23 14:47:03,715 [INFO ] W-9000-distilbert-base-uncased-d com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 61
2022-03-23 14:47:03,715 [INFO ] W-distilbert-base-uncased-d-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.08249282836914062 ms
2022-03-23 14:47:03,715 [INFO ] W-9000-distilbert-base-uncased-d ACCESS_LOG - /169.254.178.2:51978 "POST /invocations HTTP/1.1" 200 61
2022-03-23 14:47:04,201 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:51978 "GET /ping HTTP/1.1" 200 0
philschmid commented 2 years ago

When deploying a question answering model with later versions of python, transformers and pytorch the logs do not seem to be seen in cloudwatch. If I deploy the same model but with older python and module versions the logs can be seen in cloudwatch.

You say with an older version you see logs but not with a newer one? The Cases you attached are vice versa. What is correct?

ibegleris commented 2 years ago

Yeah sorry, fixed it. I can see the logs with the old versions but not with the new ones.

philschmid commented 2 years ago

@ibegleris i got in touch with the SageMaker Team. I ll come back when i know something

satishpasumarthi commented 2 years ago

Looks like the logs are present in HF DLC with PT 1.8 as well.

philschmid commented 2 years ago

I can confirm that with transformers_version = '4.17' and pytorch_version = '1.10' the logs appearing again.

satishpasumarthi commented 2 years ago

Update: The issue is with the HF PT 1.9 containers not having latest version of sagemaker-inference. We are working internally to resolve this.