BERT model loading not working with pytorch 1.3.1-eia container

rucha3 commented 3 years ago

I have a custom python file for inference in which I have implemented the functions model_fn, input_fn, predict_fn and output_fn. I have saved the model as a torchscript using torch.jit.trace, torch.jit.save and loading it using torch.jit.load. The model_fn implementation is as follows:

import torch
import os
import logging

logger = logging.getLogger()
is_ei = os.getenv("SAGEMAKER_INFERENCE_ACCELERATOR_PRESENT") == "true"
logger.warn(f"Elastic Inference enabled: {is_ei}")

def model_fn(model_dir):
    model_path = os.path.join(model_dir, "model_best.pt")
    try:
        loaded_model = torch.jit.load(model_path, map_location=torch.device('cpu'))
        loaded_model.eval()
        return loaded_model
    except Exception as e:
        logger.exception(f"Exception in model fn {e}")
        return None

This implementation works perfectly for the container with pytorch 1.5. But for container with torch 1.3.1 it exits abruptly when loading the pretrained model without any logs. The only line I see in the logs is

algo-1-nvqf7_1  | 2020-11-30 07:17:15,392 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.
algo-1-nvqf7_1  | 2020-11-30 07:17:15,393 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Retry worker: 9000-44f1cd64 in 1 seconds.

The worker dies and tries to restart, and the process repeats till I stop the container.

The model I am using is trained with pytorch 1.5. But since EI support is only supported till 1.3.1, I am using this container.

Things I have tried:

The same code with same model works outside the container with pytorch version 1.3.1. So, I don't think pytorch version compatibility is the issue.
Tried using debug and notset levels for logs. Didn't get any more info as to why model loading fails
Tried loading the original model instead of the traced one. Again this works in 1.5 but not in 1.3.1. Fails at the same point, while loading the BERT pretrained model.
Tried this setup on sagemaker notebook instance with gpu accelerator and sagemaker PytorchModel's deploy() function with framework_version as 1.3.1. Also tried it using the 1.3.1 container without eia. Has same behaviour everywhere.

Am I doing something wrong or missing something crucial from the documentation? Any help would be much appreciated.

Logs for container with torch 1.3.1-eia

algo-1-nvqf7_1  | 2020-11-30 07:17:14,333 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
algo-1-nvqf7_1  | MMS Home: /opt/conda/lib/python3.6/site-packages
algo-1-nvqf7_1  | Current directory: /
algo-1-nvqf7_1  | Temp directory: /home/model-server/tmp
algo-1-nvqf7_1  | Number of GPUs: 0
algo-1-nvqf7_1  | Number of CPUs: 8
algo-1-nvqf7_1  | Max heap size: 6972 M
algo-1-nvqf7_1  | Python executable: /opt/conda/bin/python
algo-1-nvqf7_1  | Config file: /etc/sagemaker-mms.properties
algo-1-nvqf7_1  | Inference address: http://0.0.0.0:8080
algo-1-nvqf7_1  | Management address: http://0.0.0.0:8080
algo-1-nvqf7_1  | Model Store: /.sagemaker/mms/models
algo-1-nvqf7_1  | Initial Models: ALL
algo-1-nvqf7_1  | Log dir: /logs
algo-1-nvqf7_1  | Metrics dir: /logs
algo-1-nvqf7_1  | Netty threads: 0
algo-1-nvqf7_1  | Netty client threads: 0
algo-1-nvqf7_1  | Default workers per model: 1
algo-1-nvqf7_1  | Blacklist Regex: N/A
algo-1-nvqf7_1  | Maximum Response Size: 6553500
algo-1-nvqf7_1  | Maximum Request Size: 6553500
algo-1-nvqf7_1  | Preload model: false
algo-1-nvqf7_1  | Prefer direct buffer: false
algo-1-nvqf7_1  | 2020-11-30 07:17:14,391 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model
algo-1-nvqf7_1  | 2020-11-30 07:17:14,481 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_pytorch_serving_container.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp
algo-1-nvqf7_1  | 2020-11-30 07:17:14,482 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
algo-1-nvqf7_1  | 2020-11-30 07:17:14,482 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 51
algo-1-nvqf7_1  | 2020-11-30 07:17:14,482 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,483 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.6
algo-1-nvqf7_1  | 2020-11-30 07:17:14,483 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,487 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,496 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
algo-1-nvqf7_1  | 2020-11-30 07:17:14,544 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
algo-1-nvqf7_1  | 2020-11-30 07:17:14,545 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
algo-1-nvqf7_1  | Model server started.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,547 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,962 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - PyTorch version 1.3.1 available.
algo-1-nvqf7_1  | 2020-11-30 07:17:15,314 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Lock 140580224398952 acquired on /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.lock
algo-1-nvqf7_1  | 2020-11-30 07:17:15,315 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/torch/transformers/tmpcln39mxo
algo-1-nvqf7_1  | 2020-11-30 07:17:15,344 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storing https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt in cache at /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - creating metadata file for /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Lock 140580224398952 released on /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.lock
algo-1-nvqf7_1  | 2020-11-30 07:17:15,350 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
algo-1-nvqf7_1  | 2020-11-30 07:17:15,378 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Created tokenizer
algo-1-nvqf7_1  | 2020-11-30 07:17:15,378 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Elastic Inference enabled: True
algo-1-nvqf7_1  | 2020-11-30 07:17:15,378 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - inside model fn
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /.sagemaker/mms/models/model
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /.sagemaker/mms/models/model/model.pt
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ['model.pt', 'model.tar.gz', 'code', 'model_tn_best.pth', 'MAR-INF']
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading torch script
algo-1-nvqf7_1  | 2020-11-30 07:17:15,392 [INFO ] epollEventLoopGroup-4-1 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-44f1cd64 Worker disconnected. WORKER_STARTED
algo-1-nvqf7_1  | 2020-11-30 07:17:15,392 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.
algo-1-nvqf7_1  | 2020-11-30 07:17:15,393 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Retry worker: 9000-44f1cd64 in 1 seconds.
algo-1-nvqf7_1  | 2020-11-30 07:17:16,065 [INFO ] W-9000-model ACCESS_LOG - /172.18.0.1:45110 "GET /ping HTTP/1.1" 200 8

tracycxw commented 3 years ago

@rucha3 Model trained with torch 1.5 will not work with 1.3.1- eia image. 1.5.1-eia container is under development, will be out around mid Jan :)

rucha3 commented 3 years ago

@tracycxw Thanks for your response. So, I also tried saving it as a traced model with torch.jit in pytorch 1.3.1 but that didnt work as well. Is that expected?

tracycxw commented 3 years ago

@rucha3 Following links might help, can you check this first and see if anything missing? https://aws.amazon.com/blogs/machine-learning/fine-tuning-a-pytorch-bert-model-and-deploying-it-with-amazon-elastic-inference-on-amazon-sagemaker/

https://github.com/aws-samples/amazon-sagemaker-bert-pytorch/blob/master/bert-sm-python-SDK.ipynb

I have some issues when reproducing the error. If you still see the error, are you willing to provide more details of your setup and codes of how you train&save&load the model? You can send info to "amazon-ei-feedback@amazon.com".

0) what's your workflow? Training model on SM -> deploy it to SM endpoint using EI 1) what's your instance type 2) are you training model on SM? if so, which instance type you choose 3) what's your eia type

rucha3 commented 3 years ago

@tracycxw Here are the details of my setup that I had:

Training model on SM and deploying locally on a SM notebook instance with attached EI accelerator
notebook instance: ml.m5.2xlarge
Training instance: ml.g4dn.2xlarge
EIA type: ml.eia2.medium

So, to ensure that I was doing everything right, this time I followed the exact same notebook that you shared - https://github.com/aws-samples/amazon-sagemaker-bert-pytorch/blob/master/bert-sm-python-SDK.ipynb.

I used a ml.m5.2xlarge notebook instance with ml.eia2.medium eia. Only change was instead of training and deploying on a different instance, I trained and deployed on the local instance with a local SM session. And I was able to reproduce the same error - The worker dies while loading the model. It works fine when not using the accelerator for deployment.

Here is my notebook: https://github.com/rucha3/pytorch-bert-eia-demo/blob/main/bert_example.ipynb I have retained outputs of some cells to show failure logs.

tracycxw commented 3 years ago

@rucha3 Might due to transformer version issue. There's some requirements here: https://github.com/aws-samples/amazon-sagemaker-bert-pytorch/blob/master/code/requirements.txt Can you try !pip install -r code/requirements.txt at the beginning? I've tried train and deploy on the local but didn't see worker died in this case. Hope this can solve your load model issue.

Further, If you have issue with inference with accelerator, you might want to check torch version. For now, you need to make sure you trace the model using torch 1.3.1.

rucha3 commented 3 years ago

@tracycxw Thanks a lot, using the latest transformers version worked! Initially I was using version 2.11.0 because thats what the model was trained with. So what finally worked was using latest transformers version 4.1.1 for tracing as well as in my inference script inside the container. I didnt have to trace it with pytorch 1.3.1 either, 1.5.0 worked just fine. Thanks again.

aws / sagemaker-pytorch-inference-toolkit

BERT model loading not working with pytorch 1.3.1-eia container #87