bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.17k stars 792 forks source link

[bentoML service on inf1] RuntimeError: forward() is missing value for argument 'tensor' #1655

Closed Matthieu-Tinycoaching closed 2 years ago

Matthieu-Tinycoaching commented 3 years ago

Hello,

I tried to save a bentoML service of a roBERTa model on inf1 instance with this code:

from typing import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.frameworks.pytorch import PytorchModelArtifact
from bentoml.service.artifacts.common import PickleArtifact
import torch
import torch_neuron
import os

# one core per worker
os.environ['NEURONCORE_GROUP_SIZES'] = '1'

@env(requirements_txt_file='./requirements.txt',
    docker_base_image="bentoml/model-server:0.12.0-slim-py36")
@artifacts([PytorchModelArtifact('model'), PickleArtifact('tokenizer')])
class StsBTCustomInf1PytorchService(BentoService):
    @api(input=JsonInput(), batch=True)
    def predict(self, list_sentences: List[str]):
        tokenizer = self.artifacts.tokenizer
        encoded_input = tokenizer(list_sentences, padding='max_length', truncation=True, max_length=128, return_tensors='pt')
        with torch.no_grad():
            #Compute token embeddings
            model_output = self.artifacts.model(**encoded_input)
        return model_output.numpy()

## Import the PytorchStsTransformerCpuService class defined
from transformers import AutoTokenizer

sts_pt = StsBTCustomInf1PytorchService()

# Pack a TorchScript Model
model = torch.jit.load('stsb-xlm-r-multilingual-custom-neuron-data-parallel-pad128-b1.pt')
model.eval()
sts_pt.pack('model', model)

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/stsb-xlm-r-multilingual", use_fast=True)
sts_pt.pack('tokenizer', tokenizer)

# Save BentoService
sts_pt.save()

However, when trying to start the bentoML service in inf1 instance with: bentoml serve-gunicorn --enable-microbatch $saved_path --workers 1 --port=5000

An trying to test API inference with the following code:

import requests
import json
import numpy as np
import torch

API_url = "http://ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com:5000/predict"
sentences1 = ["Navigateur Web : Ce logiciel permet d'accéder à des pages web depuis votre ordinateur. Il en existe plusieurs téléchargeables gratuitement comme Google Chrome ou Mozilla. Certains sont même déjà installés comme Safari sur Mac OS et Edge sur Microsoft."]
response1 = requests.post(API_url, json=sentences1[0])

I got the following error log:

[2021-06-04 09:01:18,711] INFO - Starting BentoML proxy in production mode..
[2021-06-04 09:01:18,712] INFO - Starting BentoML API server in production mode..
[2021-06-04 09:01:18,880] INFO - Running micro batch service on :5000
[2021-06-04 09:01:18 +0000] [1998] [INFO] Starting gunicorn 20.1.0
[2021-06-04 09:01:18 +0000] [1998] [INFO] Listening at: http://0.0.0.0:5000 (1998)
[2021-06-04 09:01:18 +0000] [1998] [INFO] Using worker: aiohttp.worker.GunicornWebWorker
[2021-06-04 09:01:18 +0000] [2005] [INFO] Starting gunicorn 20.1.0
[2021-06-04 09:01:18 +0000] [2005] [INFO] Listening at: http://0.0.0.0:48079 (2005)
[2021-06-04 09:01:18 +0000] [2005] [INFO] Using worker: sync
[2021-06-04 09:01:18 +0000] [2006] [INFO] Booting worker with pid: 2006
[2021-06-04 09:01:18 +0000] [2007] [INFO] Booting worker with pid: 2007
[2021-06-04 09:01:18,910] INFO - Micro batch enabled for API `predict` max-latency: 20000 max-batch-size 4000
[2021-06-04 09:01:18,910] INFO - Your system nofile limit is 8192, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
[2021-06-04 09:01:19,788] INFO - Using user specified docker base image: `bentoml/model-server:0.12.0-slim-py36`, usermust make sure that the base image either has Python 3.6 or conda installed.
[2021-06-04 09:01:20,818] WARNING - BentoML by default does not include spacy and torchvision package when using PytorchModelArtifact. To make sure BentoML bundle those packages if they are required for your model, either import those packages in BentoService definition file or manually add them via `@env(pip_packages=['torchvision'])` when defining a BentoService
[2021-06-04 09:04:08,791] ERROR - Error caught in API function:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/bentoml/service/inference_api.py", line 177, in wrapped_func
    return self._user_func(*args, **kwargs)
  File "/home/ubuntu/bentoml/repository/StsBTCustomInf1PytorchService/20210604083513_74AEBF/StsBTCustomInf1PytorchService/sts_transformer_pt_inf1_batchTrue_custom_requirement_file.py", line 33, in predict
    model_output = self.artifacts.model(**encoded_input)
  File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: forward() is missing value for argument 'tensor'. Declaration: forward(__torch__.torch_neuron.convert.AwsNeuronGraphModule self, Tensor tensor, Tensor tensor0) -> (Tensor)

Could anyone help me regarding this problem?

bojiang commented 3 years ago
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/bentoml/service/inference_api.py", line 177, in wrapped_func
    return self._user_func(*args, **kwargs)
  File "/home/ubuntu/bentoml/repository/StsBTCustomInf1PytorchService/20210604083513_74AEBF/StsBTCustomInf1PytorchService/sts_transformer_pt_inf1_batchTrue_custom_requirement_file.py", line 33, in predict
    model_output = self.artifacts.model(**encoded_input)
  File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: forward() is missing value for argument 'tensor'. Declaration: forward(__torch__.torch_neuron.convert.AwsNeuronGraphModule self, Tensor tensor, Tensor tensor0) -> (Tensor)

Just as the error stack pointed out, it's caused by this line of the user API function:

 model_output = self.artifacts.model(**encoded_input)

Not a bug of bentoml.

bojiang commented 3 years ago

Apparently, it is because the original model excepts users to call it with self.artifacts.model(tensor=..., tensor0=...). But ** encoded_input failed to provide this two arguments.

Matthieu-Tinycoaching commented 3 years ago

Hi @bojiang thanks

I try the same code on other AWS instances with classical Torchscript (not as torch_neuron here) and it always worked perfectly.

Normally ** encoded_input should provide these two tensors arguments which are input_ids and attention_mask.

bojiang commented 3 years ago

We can use self.artifacts.model(encoded_input['input_ids'], encoded_input['attention_mask']) for now.

I try the same code on other AWS instances with classical Torchscript (not as torch_neuron here) and it always worked perfectly.

Oh, thus you mean the original model (before packed as the bentoml artifact) takes input_ids and attention_mask as argument keywords? It should be a bug of bentoml.frameworks.pytorch.PytorchArtifact if it's true. Could you please help check it, the original model 's kwargs?

Matthieu-Tinycoaching commented 3 years ago

@bojiang thanks for the patch it works.

Yes, my original predict method with torchscript model use the following code:

        tokenizer = self.artifacts.tokenizer
        encoded_input = tokenizer(list_sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
        with torch.no_grad():
            model_output = self.artifacts.model(**encoded_input)

with encoded_input a dictionnary made of the two fields input_ids and attention_mask which are torch tensors. What should it be normally?

Also, I have two related problems with torchscript compilation from torch_neuron package. It seems that all is static as the batch_size and padding_length. Whenever, I want to use microbatching from bentoML and since I have compiled the model with a batch_size of 1, at each inference of batch_size automatically determined by bentoML I got error like that:

[2021-06-04 10:05:45,400] ERROR - Error caught in API function:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/bentoml/service/inference_api.py", line 177, in wrapped_func
    return self._user_func(*args, **kwargs)
  File "/home/ubuntu/bentoml/repository/StsBTCustomInf1PytorchService/20210604095716_503730/StsBTCustomInf1PytorchService/sts_transformer_pt_inf1_batchTrue_custom_requirement_file.py", line 41, in predict
    model_output = self.artifacts.model(tensor=encoded_input['input_ids'], tensor0=encoded_input['attention_mask'])
  File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torch_neuron/convert.py", line 38, in forward
    _18 = torch.embedding(CONSTANTS.c5, _14, 1, False, False)
    _19 = [torch.add(_17, _18, alpha=1), _10, tensor0]
    _20 = ops.neuron.forward_1(_19, CONSTANTS.c6, CONSTANTS.c7, CONSTANTS.c8)
          ~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return _20

Traceback of TorchScript, original code (most recent call last):
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(309): neuron_function
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py(779): trace
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(313): create_runnable
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(194): trace
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py(448): _convert_item
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py(186): run_op
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py(176): __call__
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py(365): compile_fused_operators
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py(121): trace
torchserve_sts_transformer_torchscript_cpu_pad128_b1.py(76): <module>
RuntimeError: 
    Incorrect tensor shape at input tensor #0: received 3 128 768, expected 1 128 768.
    Incorrect tensor shape at input tensor #1: received 3 1 1 128, expected 1 1 1 128.
    Incorrect tensor shape at input tensor #2: received 3 128, expected 1 128.

The same is observable with padding, since whenever I want to use padding=True instead of padding='max_length' the call to the model expects an input tensor of size max_length. This is strange since with original torchscript the problem doesn't exist. Would you have any advice regarding this?

Thanks!

bojiang commented 3 years ago

I have compiled the model with a batch_size of 1 received 3 128 768, expected 1 128 768 

Assume the first dimension is the batch dimension.

Generally, for an NLP tokenizer, the padding option is to pad on the dimension of the sentence length, rather than the batch dim.

Matthieu-Tinycoaching commented 3 years ago

@bojiang yes the first dimension is the batch dimension, which is expected to be 1 as the one use for torchscript compilation.

I didn't show you the error with padding but it is represented by 128 (max_length) in 1 128 768. So when using the option padding=True in tokenizer it gave me same kind of result but in another dimension:

Incorrect tensor shape at input tensor #0: received 1 57 768, expected 1 128 768.

So batch and padding dimensions are to different things, but seems to be static in neuro torchscript compiler.

Any advice on how to solve it?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

aarnphm commented 2 years ago

Close this for now since this is not related to BentoML. You can DM me or bojiang if you still need help :smile: