The pre-trained checkpoint generates very short output

Richar-Du commented 1 year ago

Thanks for your awesome work!

I want to utilize the model to generate the HTML of an image, so I choose the pre-trained checkpoint without fine-tuning. However, the generated output is very short. For example, the following code only generate <img_src=image> without any detailed struct.

from PIL import Image
import torch
from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration
device = torch.device("cuda")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-large")
model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-large").to(device)

img_path  = 'biography.png'
image = Image.open(img_path)
processor.image_processor.is_vqa=False

inputs = processor(images=image, return_tensors="pt").to(device)
generated_ids = model.generate(**inputs, max_length=1000)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

My transformers version is 4.28.0. Do you know how to solve this problem? Thanks in advance :)

nbroad1881 commented 1 year ago

Should probably upgrade transformers https://github.com/huggingface/transformers/issues/22903

Richar-Du commented 1 year ago

I have updated transformers to 4.30.2, but it doesn't work. The input of the processor is: and I want to use pix2struct-large to generate its corresponding html. However, now the generated text is just: '<>'

@nbroad1881 @younesbelkada

HeimingX commented 1 year ago

Hi, I also encountered the same problem. I took a screenshot of the left subgraph of Figure 1 in the pix2struct paper, and the pix2struct-large model can only output the same '<>'. This is severely inconsistent with expectations and I am quite confused. I am eagerly anticipating the response from the author. Thanks a lot.

PS: my transformer version is 4.31.0.

ChenDelong1999 commented 1 year ago

+1

nbroad1881 commented 1 year ago

@kentonl, is there a prompt for pretraining?

luukvankooten commented 1 year ago

+1

Alexwangziyu commented 1 year ago

+1

google-research / pix2struct

The pre-trained checkpoint generates very short output #38