Closed aman-cc closed 1 year ago
Hello, We have reproduced the issue and will get back to you shortly with more information. Thanks.
Hi @aman-cc,
The original issue was that in your provided script, the loaded Neuron module wasn't being called by the forward function due to the missing re-defined forward for the encoder and decoder (reference the sample notebook). After including those forward functions in the script, I see the Neuron cores being utilized.
However, I also noticed the latency was, more or less, the same as on CPU for an inf1.6xlarge
(around 240ms). Since this is an encoder/decoder model, I believe that inf2 would be a better fit for this model. Using an inf2.8xlarge
, I get a latency of 19ms. Furthermore, you would only need to wrap the model (assuming you want to trace model.generate()
), and trace that with torch_neuronx.trace()
; no need to redefine the forward function for the encoder and decoder.
Something like:
class ModelWrapper(torch.nn.Module):
def __init__(self,trocr):
super().__init__()
self.trocr = trocr
def forward(self,x):
return self.trocr.generate(x)
def load_sample_imgE():
if not os.path.exists("text.jpg"):
urllib.request.urlretrieve('https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg', "text.jpg")
return cv2.imread("text.jpg")
# ...
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-printed')
model = ModelWrapper(model)
max_len = 32
img = load_sample_imgE()
pixel_values = processor(img, max_length=max_length, padding='max_length',
truncate=True, return_tensors="pt").pixel_values
pixel_values = processor(img, max_length=max_length, padding='max_length',
truncate=True, return_tensors="pt").pixel_values
model_neuron = torch_neuronx.trace(model,pixel_values)
filename = 'trocr.pt'
torch.jit.save(model_neuron, filename)
For more information on torch-neuron
(inf1) vs torch-neuronx
(trn1/inf2) check here and for torch-neuronx
setup, go here.
Please let us know if this option works for you. If it does, please try it out and let us know if it helps!
I'm trying to do inference for TrOCR on
Inf1
instance. Able to compile and save model as per the notebook but the model execution is happening on CPU right now. Neuron core are unutilized. Please provide a way so that the inference takes use of neuron cores.