Closed Matthieu-Tinycoaching closed 2 years ago
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/bentoml/service/inference_api.py", line 177, in wrapped_func
return self._user_func(*args, **kwargs)
File "/home/ubuntu/bentoml/repository/StsBTCustomInf1PytorchService/20210604083513_74AEBF/StsBTCustomInf1PytorchService/sts_transformer_pt_inf1_batchTrue_custom_requirement_file.py", line 33, in predict
model_output = self.artifacts.model(**encoded_input)
File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
RuntimeError: forward() is missing value for argument 'tensor'. Declaration: forward(__torch__.torch_neuron.convert.AwsNeuronGraphModule self, Tensor tensor, Tensor tensor0) -> (Tensor)
Just as the error stack pointed out, it's caused by this line of the user API function:
model_output = self.artifacts.model(**encoded_input)
Not a bug of bentoml.
Apparently, it is because the original model excepts users to call it with self.artifacts.model(tensor=..., tensor0=...)
. But ** encoded_input
failed to provide this two arguments.
Hi @bojiang thanks
I try the same code on other AWS instances with classical Torchscript (not as torch_neuron
here) and it always worked perfectly.
Normally ** encoded_input
should provide these two tensors arguments which are input_ids
and attention_mask
.
We can use self.artifacts.model(encoded_input['input_ids'], encoded_input['attention_mask'])
for now.
I try the same code on other AWS instances with classical Torchscript (not as torch_neuron here) and it always worked perfectly.
Oh, thus you mean the original model (before packed as the bentoml artifact) takes input_ids
and attention_mask
as argument keywords?
It should be a bug of bentoml.frameworks.pytorch.PytorchArtifact
if it's true.
Could you please help check it, the original model
's kwargs?
@bojiang thanks for the patch it works.
Yes, my original predict method with torchscript model use the following code:
tokenizer = self.artifacts.tokenizer
encoded_input = tokenizer(list_sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
with torch.no_grad():
model_output = self.artifacts.model(**encoded_input)
with encoded_input
a dictionnary made of the two fields input_ids
and attention_mask
which are torch tensors. What should it be normally?
Also, I have two related problems with torchscript compilation from torch_neuron
package. It seems that all is static as the batch_size
and padding_length
. Whenever, I want to use microbatching from bentoML and since I have compiled the model with a batch_size
of 1, at each inference of batch_size automatically determined by bentoML I got error like that:
[2021-06-04 10:05:45,400] ERROR - Error caught in API function:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/bentoml/service/inference_api.py", line 177, in wrapped_func
return self._user_func(*args, **kwargs)
File "/home/ubuntu/bentoml/repository/StsBTCustomInf1PytorchService/20210604095716_503730/StsBTCustomInf1PytorchService/sts_transformer_pt_inf1_batchTrue_custom_requirement_file.py", line 41, in predict
model_output = self.artifacts.model(tensor=encoded_input['input_ids'], tensor0=encoded_input['attention_mask'])
File "/home/ubuntu/anaconda3/envs/test_bentoml_inf1_fresh/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/torch_neuron/convert.py", line 38, in forward
_18 = torch.embedding(CONSTANTS.c5, _14, 1, False, False)
_19 = [torch.add(_17, _18, alpha=1), _10, tensor0]
_20 = ops.neuron.forward_1(_19, CONSTANTS.c6, CONSTANTS.c7, CONSTANTS.c8)
~~~~~~~~~~~~~~~~~~~~ <--- HERE
return _20
Traceback of TorchScript, original code (most recent call last):
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(309): neuron_function
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py(779): trace
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(313): create_runnable
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(194): trace
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py(448): _convert_item
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py(186): run_op
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py(176): __call__
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py(365): compile_fused_operators
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py(121): trace
torchserve_sts_transformer_torchscript_cpu_pad128_b1.py(76): <module>
RuntimeError:
Incorrect tensor shape at input tensor #0: received 3 128 768, expected 1 128 768.
Incorrect tensor shape at input tensor #1: received 3 1 1 128, expected 1 1 1 128.
Incorrect tensor shape at input tensor #2: received 3 128, expected 1 128.
The same is observable with padding
, since whenever I want to use padding=True
instead of padding='max_length'
the call to the model expects an input tensor of size max_length
. This is strange since with original torchscript the problem doesn't exist. Would you have any advice regarding this?
Thanks!
I have compiled the model with a batch_size of 1 received 3 128 768, expected 1 128 768
Assume the first dimension is the batch dimension.
Generally, for an NLP tokenizer, the padding
option is to pad on the dimension of the sentence length, rather than the batch dim.
@bojiang yes the first dimension is the batch dimension, which is expected to be 1 as the one use for torchscript compilation.
I didn't show you the error with padding but it is represented by 128
(max_length) in 1 128 768
. So when using the option padding=True
in tokenizer it gave me same kind of result but in another dimension:
Incorrect tensor shape at input tensor #0: received 1 57 768, expected 1 128 768.
So batch and padding dimensions are to different things, but seems to be static in neuro torchscript compiler.
Any advice on how to solve it?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Close this for now since this is not related to BentoML. You can DM me or bojiang if you still need help :smile:
Hello,
I tried to save a bentoML service of a roBERTa model on inf1 instance with this code:
However, when trying to start the bentoML service in inf1 instance with:
bentoml serve-gunicorn --enable-microbatch $saved_path --workers 1 --port=5000
An trying to test API inference with the following code:
I got the following error log:
Could anyone help me regarding this problem?