aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
464 stars 154 forks source link

BLIP2 Support on Inf2 #763

Open KhalilGuetari opened 1 year ago

KhalilGuetari commented 1 year ago

Hello,

I am struggling to trace BLIP2 model from transformers library using torch_neuronx to make it work on an inf2. The model I want to trace is the XXL version but this one doesn't fit on one neuron device. So I want to start by tracing a smaller model, the flan-t5-xl version on an inf2.8xlarge.

The tracing actually completes, but the error is raised when loading the traced model.

Trace Model Code:

import PIL
import torch
import torch_neuronx
from transformers import Blip2Processor, Blip2ForConditionalGeneration

# Load model
model_path = 'Salesforce/blip2-flan-t5-xl'
processor = Blip2Processor.from_pretrained(model_path)
model = Blip2ForConditionalGeneration.from_pretrained(model_path, torchscript=True, return_dict=False)
model.eval()

# Example of input
img = PIL.Image.new(mode='RGB', size=(720, 1280))
inputs = processor(images=[img], text="", return_tensors='pt')
input_ids = torch.LongTensor([[1]])
attention_mask = torch.LongTensor([[1]])
decoder_ids = torch.LongTensor([[0]])
inputs_tuple = (inputs['pixel_values'], input_ids, attention_mask, decoder_ids)

out_cpu = model(*inputs_tuple)

model_neuron = torch_neuronx.trace(model, inputs_tuple, compiler_args='--target inf2 --enable-saturate-infinity')

filename = 'blip2_neuron_xl.pt'
torch.jit.save(model_neuron, filename)

Inference Code

import torch
from PIL import Image
from transformers import Blip2Processor

model_path = 'Salesforce/blip2-flan-t5-xl'
processor = Blip2Processor.from_pretrained(model_path)

loaded_model = torch.jit.load("blip2_neuron_xl.pt")
print('Model loaded successfuly')

img_path = './test_img.jpg'
img = Image.open(img_path).convert('RGB')
inputs = processor(images=img, text="", return_tensors='pt')
input_ids = torch.LongTensor([[1]])
attention_mask = torch.LongTensor([[1]])
decoder_ids = torch.LongTensor([[1]])
inputs_tuple = (inputs['pixel_values'], input_ids, attention_mask, decoder_ids)

out = loaded_model(*inputs_tuple)

print(out)

Error Message

Traceback (most recent call last):
  File "/home/ec2-user/blip2-medium-neuronx/inference.py", line 11, in <module>
    loaded_model = torch.jit.load("blip2_neuron_xl.pt")
  File "/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/jit/_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: Unknown opcode for unpickling at 0xe: 14

I also ran torch_neuronx.analyze() for more information:

{
    "torch_neuronx_version": "1.13.1.1.11.0",
    "neuronx_cc_version": "2.10.0.35+3817a0c8c",
    "support_percentage": "99.94%",
    "supported_operators": {
        "aten::size": 427,
        "aten::_convolution": 1,
        "aten::flatten": 1,
        "aten::transpose": 418,
        "aten::expand": 2,
        "aten::to": 209,
        "aten::cat": 3,
        "aten::slice": 62,
        "aten::add": 377,
        "aten::layer_norm": 111,
        "aten::linear": 686,
        "aten::floor_divide": 39,
        "aten::Int": 45,
        "aten::reshape": 78,
        "aten::permute": 152,
        "aten::select": 119,
        "aten::matmul": 258,
        "aten::mul": 340,
        "aten::softmax": 129,
        "aten::dropout": 332,
        "aten::gelu": 99,
        "aten::ones": 4,
        "aten::unsqueeze": 21,
        "aten::rsub": 5,
        "aten::view": 361,
        "aten::div": 22,
        "aten::contiguous": 90,
        "aten::embedding": 4,
        "aten::pow": 122,
        "aten::mean": 122,
        "aten::rsqrt": 122,
        "aten::arange": 5,
        "aten::sub": 2,
        "aten::gt": 1,
        "aten::abs": 1,
        "aten::lt": 2,
        "aten::log": 2,
        "aten::min": 3,
        "aten::where": 2,
        "aten::add_": 73,
        "aten::type_as": 72,
        "aten::repeat": 1,
        "aten::le": 1,
        "aten::neg": 1,
        "aten::zeros": 1
    },
    "unsupported_operators": [
        {
            "kind": "aten::full_like",
            "failureAt": "Lowering to HLO",
            "call": "/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(430): _relative_position_bucket\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(443): compute_bias\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(543): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(601): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(694): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(1094): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(1680): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/blip_2/modeling_blip_2.py(1767): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/jit/_trace.py(976): trace_module\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/jit/_trace.py(759): trace\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch_neuronx/xla_impl/analyze.py(1097): analyze\n/home/ec2-user/blip2-medium-neuronx/trace_model.py(38): <module>\n",
            "opGraph": "graph(%relative_position_if_large.1 : Long(33, 33, strides=[33, 1], requires_grad=0, device=cpu),\n      %neuron_35560 : int,\n      %neuron_35567 : int,\n      %neuron_35544 : int,\n      %neuron_35566 : Device,\n      %neuron_35550 : bool,\n      %neuron_35551 : NoneType):\n  %neuron_35696 : Long(33, 33, strides=[33, 1], requires_grad=0, device=cpu) = aten::full_like(%relative_position_if_large.1, %neuron_35560, %neuron_35567, %neuron_35544, %neuron_35566, %neuron_35550, %neuron_35551)\n  return (%neuron_35696)\n"
        },
        {
            "kind": "aten::zeros_like",
            "failureAt": "Lowering to HLO",
            "call": "/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(416): _relative_position_bucket\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(443): compute_bias\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(543): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(601): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(694): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(1094): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(1717): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/blip_2/modeling_blip_2.py(1767): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/jit/_trace.py(976): trace_module\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/jit/_trace.py(759): trace\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch_neuronx/xla_impl/analyze.py(1097): analyze\n/home/ec2-user/blip2-medium-neuronx/trace_model.py(38): <module>\n",
            "opGraph": "graph(%relative_position.5 : Long(1, 1, strides=[1, 1], requires_grad=0, device=cpu),\n      %neuron_35567 : int,\n      %neuron_35544 : int,\n      %neuron_35566 : Device,\n      %neuron_35550 : bool,\n      %neuron_35551 : NoneType):\n  %neuron_37886 : Long(1, 1, strides=[1, 1], requires_grad=0, device=cpu) = aten::zeros_like(%relative_position.5, %neuron_35567, %neuron_35544, %neuron_35566, %neuron_35550, %neuron_35551)\n  return (%neuron_37886)\n"
        },
        {
            "kind": "aten::full_like",
            "failureAt": "Lowering to HLO",
            "call": "/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(430): _relative_position_bucket\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(443): compute_bias\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(543): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(601): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(694): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(1094): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py(1717): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/transformers/models/blip_2/modeling_blip_2.py(1767): forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/jit/_trace.py(976): trace_module\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch/jit/_trace.py(759): trace\n/home/ec2-user/aws_neuron_venv_pytorch/lib/python3.9/site-packages/torch_neuronx/xla_impl/analyze.py(1097): analyze\n/home/ec2-user/blip2-medium-neuronx/trace_model.py(38): <module>\n",
            "opGraph": "graph(%relative_position_if_large.5 : Long(1, 1, strides=[1, 1], requires_grad=0, device=cpu),\n      %neuron_35541 : int,\n      %neuron_35567 : int,\n      %neuron_35544 : int,\n      %neuron_35566 : Device,\n      %neuron_35550 : bool,\n      %neuron_35551 : NoneType):\n  %neuron_37897 : Long(1, 1, strides=[1, 1], requires_grad=0, device=cpu) = aten::full_like(%relative_position_if_large.5, %neuron_35541, %neuron_35567, %neuron_35544, %neuron_35566, %neuron_35550, %neuron_35551)\n  return (%neuron_37897)\n"
        }
    ]
}

How should i trace the model? It uses T5 to generate text, so maybe the problem comes from this part. Could you please provide help or guidance on that? Thanks!

aws-donkrets commented 1 year ago

Hi KhalilGuetari, the Unknown opcode for unpickling at ... error seems to be coming from the PyTorch framework code. I've redirected your request to an internal team who may be more familiar with that code.