aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
122 stars 33 forks source link

Neuron Core Inference support for TrOCR #7

Closed aman-cc closed 1 year ago

aman-cc commented 1 year ago

I'm trying to do inference for TrOCR on Inf1 instance. Able to compile and save model as per the notebook but the model execution is happening on CPU right now. Neuron core are unutilized. Please provide a way so that the inference takes use of neuron cores.

import torch
import torch.neuron
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-small-handwritten") 
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-small-handwritten").eval()

max_length = 32
input_ids=torch.zeros([1, max_length], dtype=torch.int64)
attention_mask=torch.zeros([1, max_length], dtype=torch.int64)
encoder_hidden_states=torch.rand([1, 578, 384])
pad_size = torch.as_tensor(0)

xenc = torch.rand(1,3,384,384).float()
xdec = (input_ids, attention_mask, encoder_hidden_states, pad_size)

model.encoder.forward_neuron = torch.jit.load('troc_encoder_neuron.pt')
model.decoder.forward_neuron = torch.jit.load('troc_decoder_neuron.pt')

generated_ids = model.generate(xenc, pad_token_id=model.config.decoder.eos_token_id)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
%matplotlib inline
import os
import sys
import cv2
import urllib
import matplotlib.pyplot as plt
import time
if not '..' in sys.path: sys.path.append('..')

def load_sample_imgE():
    if not os.path.exists("text.jpg"):
        urllib.request.urlretrieve("https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg", "text.jpg")
    return cv2.imread("text.jpg")

max_len = 32
img = load_sample_imgE()

for i in range(10):
    pixel_values = processor(img, max_length=max_length, padding='max_length', 
                            truncate=True, return_tensors="pt").pixel_values
    generated_ids = model.generate(pixel_values, pad_token_id=model.config.decoder.eos_token_id, max_length=max_len)
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

t1 = time.time()
num_inf = 100
for i in range(num_inf):
    pixel_values = processor(img, max_length=max_length, padding='max_length', 
                            truncate=True, return_tensors="pt").pixel_values
    generated_ids = model.generate(pixel_values, pad_token_id=model.config.decoder.eos_token_id, max_length=max_len)
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
t2 = time.time()
print(f"Inf/sec: {num_inf/(t2-t1):.2f}")

print(generated_text)
plt.figure(figsize=(10,5))
plt.imshow(img)
Screenshot 2023-04-17 at 3 46 59 PM
aws-mvaria commented 1 year ago

Hello, We have reproduced the issue and will get back to you shortly with more information. Thanks.

aws-rishyraj commented 1 year ago

Hi @aman-cc,

The original issue was that in your provided script, the loaded Neuron module wasn't being called by the forward function due to the missing re-defined forward for the encoder and decoder (reference the sample notebook). After including those forward functions in the script, I see the Neuron cores being utilized.

However, I also noticed the latency was, more or less, the same as on CPU for an inf1.6xlarge (around 240ms). Since this is an encoder/decoder model, I believe that inf2 would be a better fit for this model. Using an inf2.8xlarge, I get a latency of 19ms. Furthermore, you would only need to wrap the model (assuming you want to trace model.generate()), and trace that with torch_neuronx.trace(); no need to redefine the forward function for the encoder and decoder.

Something like:

class ModelWrapper(torch.nn.Module):

    def __init__(self,trocr):
        super().__init__()
        self.trocr = trocr

    def forward(self,x):
        return self.trocr.generate(x)

def load_sample_imgE():
    if not os.path.exists("text.jpg"):
        urllib.request.urlretrieve('https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg', "text.jpg")
    return cv2.imread("text.jpg")
# ...
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-printed')
model = ModelWrapper(model)

max_len = 32
img = load_sample_imgE()
pixel_values = processor(img, max_length=max_length, padding='max_length', 
                        truncate=True, return_tensors="pt").pixel_values
pixel_values = processor(img, max_length=max_length, padding='max_length', 
                        truncate=True, return_tensors="pt").pixel_values

model_neuron = torch_neuronx.trace(model,pixel_values)
filename = 'trocr.pt'
torch.jit.save(model_neuron, filename)        

For more information on torch-neuron(inf1) vs torch-neuronx(trn1/inf2) check here and for torch-neuronx setup, go here.

Please let us know if this option works for you. If it does, please try it out and let us know if it helps!