AILab-CVC / SEED-X

Multimodal Models in Real World
Other
359 stars 17 forks source link

Error from image editing inference script. #6

Open j-min opened 3 months ago

j-min commented 3 months ago

Hi, thanks for sharing the code and checkpoints!

Following the README, I tried running the provided inference code for the image editing mentioned here - https://github.com/AILab-CVC/SEED-X?tab=readme-ov-file#inference-with-the-editing-model-seed-x-edit

# For image editing
python3 src/inference/eval_img2edit_seed_x_edit.py

However, I faced the following error. Any guidance you can give me would be helpful. Thanks!

...
Init agent mdoel Done
init vae
init unet
missing keys:  0 unexpected keys: 0
Init adapter done
Init adapter pipe done
Traceback (most recent call last):
  File "...SEED-X/src/inference/eval_img2edit_seed_x_edit.py", line 138, in <module>
    output = agent_model.generate(tokenizer=tokenizer,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...SEED-X/src/models/mllm/seed_x.py", line 173, in generate
    input_embeds[ids_cmp_mask] = image_embeds_lm[embeds_cmp_mask].view(-1, dim)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [128, 5120] cannot be broadcast to indexing result of shape [0, 5120]
geyuying commented 3 months ago

Hi, what is your version of transformers?

j-min commented 3 months ago

Oh, thanks for pointing it out. I was using transformers==4.40.1 and installing transformers==4.30.2 made it work! Do you know what exactly is breaking in the newer version, and do you have any migration guides for the later versions?

ChocoWu commented 3 months ago

it seems to be a problem with the tokenizer, as the <img> is tokenized into two IDs. You can check this.

huangjch526 commented 1 month ago

May I ask what is your accelerate, deepspeed, and tokenizer version in pip list?