aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
449 stars 151 forks source link

RuntimeError: Inconsistent batch sizes found on inputs #405

Closed aamirbutt closed 2 years ago

aamirbutt commented 2 years ago

Hi, I am trying to test dynamic batching with pytorch. I get the following error:

Traceback (most recent call last):
  File "trace_msmarco.py", line 52, in <module>
    results = model_neuron(*infer_inputs)
  File "/home/ubuntu/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/ubuntu/myenv/lib/python3.6/site-packages/torch_neuron/decorators.py(347): forward
/home/ubuntu/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/home/ubuntu/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py(1051): _call_impl
/home/ubuntu/myenv/lib/python3.6/site-packages/torch_neuron/graph.py(546): __call__
/home/ubuntu/myenv/lib/python3.6/site-packages/torch_neuron/graph.py(205): run_op
/home/ubuntu/myenv/lib/python3.6/site-packages/torch_neuron/graph.py(194): __call__
/home/ubuntu/myenv/lib/python3.6/site-packages/torch_neuron/convert.py(220): forward
/home/ubuntu/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/home/ubuntu/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py(1051): _call_impl
/home/ubuntu/myenv/lib/python3.6/site-packages/torch/jit/_trace.py(959): trace_module
/home/ubuntu/myenv/lib/python3.6/site-packages/torch/jit/_trace.py(744): trace
/home/ubuntu/myenv/lib/python3.6/site-packages/torch_neuron/convert.py(186): trace
trace_msmarco.py(30): <module>
RuntimeError: Inconsistent batch sizes found on inputs

Here is the full code.

from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
)
import torch
import numpy as np
import os
import torch_neuron
from torchvision import models

modelName = 'cross-encoder/ms-marco-TinyBERT-L-2-v2'

print("Neuron tracing model: " + modelName)
BATCH_SIZE = 1
question = []
context = []
question.extend(['How many people live in Berlin?'] * BATCH_SIZE)
context.extend(["Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."] * BATCH_SIZE)

model = AutoModelForSequenceClassification.from_pretrained(modelName, return_dict=False)
tokenizer = AutoTokenizer.from_pretrained(modelName)
features = tokenizer(question, context, max_length=4, padding='max_length', truncation=True, return_tensors="pt")
inputs = features['input_ids'], features['attention_mask'], features['token_type_ids']
model_neuron = torch.neuron.trace(model, 
                                  example_inputs=inputs, 
                                  strict=False, 
                                  dynamic_batch_size=True)

# Run inference with a different batch size
newQ = []
newC = []
newQ.extend(['How many people live in Berlin?'] * 2)
newC.extend(["Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."] * 2)
newFeatures = tokenizer(newQ, newC, max_length=4, padding='max_length', truncation=True, return_tensors="pt")

infer_inputs = newFeatures['input_ids'], newFeatures['attention_mask'], newFeatures['token_type_ids']
results = model_neuron(*infer_inputs)
print(results)

My torch_neuron version: 1.9.1.2.0.392.0 Any idea what needs to be done here?

aws-taylor commented 2 years ago

Hello @aamirbutt,

We are investigating the issue. In the mean time, could I request that you run python -m pip freeze and include the results so that we are better able to reproduce the issue?

Regards, Taylor

aamirbutt commented 2 years ago

FWIW, I was able to solve it by providing a custom graph-builder function:

def graph_builder(node):
    if "LSTM" in node.name or node.name.startswith('prim'):
        return False
    else:
        return True

model_neuron = torch.neuron.trace(model, 
                                  example_inputs=inputs, 
                                  strict=False, 
                                  subgraph_builder_function=graph_builder,
                                  dynamic_batch_size=True)
aws-diamant commented 2 years ago

closing per solution above