NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers
Apache License 2.0
179 stars 14 forks source link

EETQ-quantized TrOCR gives nonsense output #28

Closed donjuanpond closed 3 months ago

donjuanpond commented 3 months ago

Hello! I'm using EETQ through HuggingFace Transformers to quantize my TrOCR (vision encoder decoder) model. It is meant to generate text output from an image input, transcribing whatever text is shown in the image. I tried to quantize the model through EETQ to speed up inference using the following code:

from transformers import EetqConfig
# ... some other code here ...
eetq_config = EetqConfig("int8")
recognizer = VisionEncoderDecoderModel.from_pretrained(recognizer_path, quantization_config=eetq_config).to('cuda')
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten", quantization_config=double_quant_config)

When I run this quantized model, I get very weird results. The model starts labeling all the text in images as just the word "to". For example, what might have supposed to be labeled "3042846 JG-002" would end up being labeled "to to to to to to to to to" etc. What is causing this problem, and how can I fix it??

dtlzhuangz commented 3 months ago

I can quantize the model via eetq_quantize but cannot make it via transformers and my result seems to be correct


from PIL import Image
import requests
from eetq import eet_quantize
import torch

# load image from the IAM database

image = Image.open("/path/to/").convert("RGB")
config = EetqConfig("int8")
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-handwritten')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-handwritten').to(torch.float16).cuda()

eet_quantize(model, exclude=['output_projection'])
pixel_values = processor(images=image, return_tensors="pt").pixel_values.cuda()

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)```
donjuanpond commented 3 months ago

Ok, it looks like the exclude=['output_projection'] part of your code, as well as quantizing with eetq package instead of at load time with the config is working. Thank you for your help!