Alpha-VLLM / Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
https://arxiv.org/abs/2408.02657
507 stars 22 forks source link

Floating point exception(core dumped) #27

Closed JacobYuan7 closed 1 month ago

JacobYuan7 commented 2 months ago

Hi, many thanks for your great work. I am trying to use the code for simple inference:

from inference_solver import FlexARInferenceSolver
from PIL import Image

inference_solver = FlexARInferenceSolver(
    model_path="Alpha-VLLM/Lumina-mGPT-7B-512",
    precision="bf16",
    target_size=512,
)

q1 = "Describe the image in detail. <|image|>"

images = [Image.open("image.png")]
qas = [[q1, None]]

generated = inference_solver.generate(
    images=images,
    qas=qas,
    max_gen_len=8192,
    temperature=1.0,
    logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000),
)

a1 = generated[0]

However, it produces this error. Do you have any ideas on how to solve it? I'd appreciate any thoughts you share. Many thanks.

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.87s/it]
Some weights of ChameleonForConditionalGeneration were not initialized from the model checkpoint at Alpha-VLLM/Lumina-mGPT-7B-512 and are newly initialized: ['model.vqmodel.encoder.conv_in.bias', 'model.vqmodel.encoder.conv_in.weight', 'model.vqmodel.encoder.conv_out.bias', 'model.vqmodel.encoder.conv_out.weight', 'model.vqmodel.encoder.down.0.block.0.conv1.bias', 'model.vqmodel.encoder.down.0.block.0.conv1.weight', 'model.vqmodel.encoder.down.0.block.0.conv2.bias', 'model.vqmodel.encoder.down.0.block.0.conv2.weight', 'model.vqmodel.encoder.down.0.block.0.norm1.bias', 'model.vqmodel.encoder.down.0.block.0.norm1.weight', 'model.vqmodel.encoder.down.0.block.0.norm2.bias', 'model.vqmodel.encoder.down.0.block.0.norm2.weight', 'model.vqmodel.encoder.down.0.block.1.conv1.bias', 'model.vqmodel.encoder.down.0.block.1.conv1.weight', 'model.vqmodel.encoder.down.0.block.1.conv2.bias', 'model.vqmodel.encoder.down.0.block.1.conv2.weight', 'model.vqmodel.encoder.down.0.block.1.norm1.bias', 'model.vqmodel.encoder.down.0.block.1.norm1.weight', 'model.vqmodel.encoder.down.0.block.1.norm2.bias', 'model.vqmodel.encoder.down.0.block.1.norm2.weight', 'model.vqmodel.encoder.down.0.downsample.conv.bias', 'model.vqmodel.encoder.down.0.downsample.conv.weight', 'model.vqmodel.encoder.down.1.block.0.conv1.bias', 'model.vqmodel.encoder.down.1.block.0.conv1.weight', 'model.vqmodel.encoder.down.1.block.0.conv2.bias', 'model.vqmodel.encoder.down.1.block.0.conv2.weight', 'model.vqmodel.encoder.down.1.block.0.norm1.bias', 'model.vqmodel.encoder.down.1.block.0.norm1.weight', 'model.vqmodel.encoder.down.1.block.0.norm2.bias', 'model.vqmodel.encoder.down.1.block.0.norm2.weight', 'model.vqmodel.encoder.down.1.block.1.conv1.bias', 'model.vqmodel.encoder.down.1.block.1.conv1.weight', 'model.vqmodel.encoder.down.1.block.1.conv2.bias', 'model.vqmodel.encoder.down.1.block.1.conv2.weight', 'model.vqmodel.encoder.down.1.block.1.norm1.bias', 'model.vqmodel.encoder.down.1.block.1.norm1.weight', 'model.vqmodel.encoder.down.1.block.1.norm2.bias', 'model.vqmodel.encoder.down.1.block.1.norm2.weight', 'model.vqmodel.encoder.down.1.downsample.conv.bias', 'model.vqmodel.encoder.down.1.downsample.conv.weight', 'model.vqmodel.encoder.down.2.block.0.conv1.bias', 'model.vqmodel.encoder.down.2.block.0.conv1.weight', 'model.vqmodel.encoder.down.2.block.0.conv2.bias', 'model.vqmodel.encoder.down.2.block.0.conv2.weight', 'model.vqmodel.encoder.down.2.block.0.nin_shortcut.bias', 'model.vqmodel.encoder.down.2.block.0.nin_shortcut.weight', 'model.vqmodel.encoder.down.2.block.0.norm1.bias', 'model.vqmodel.encoder.down.2.block.0.norm1.weight', 'model.vqmodel.encoder.down.2.block.0.norm2.bias', 'model.vqmodel.encoder.down.2.block.0.norm2.weight', 'model.vqmodel.encoder.down.2.block.1.conv1.bias', 'model.vqmodel.encoder.down.2.block.1.conv1.weight', 'model.vqmodel.encoder.down.2.block.1.conv2.bias', 'model.vqmodel.encoder.down.2.block.1.conv2.weight', 'model.vqmodel.encoder.down.2.block.1.norm1.bias', 'model.vqmodel.encoder.down.2.block.1.norm1.weight', 'model.vqmodel.encoder.down.2.block.1.norm2.bias', 'model.vqmodel.encoder.down.2.block.1.norm2.weight', 'model.vqmodel.encoder.down.2.downsample.conv.bias', 'model.vqmodel.encoder.down.2.downsample.conv.weight', 'model.vqmodel.encoder.down.3.block.0.conv1.bias', 'model.vqmodel.encoder.down.3.block.0.conv1.weight', 'model.vqmodel.encoder.down.3.block.0.conv2.bias', 'model.vqmodel.encoder.down.3.block.0.conv2.weight', 'model.vqmodel.encoder.down.3.block.0.norm1.bias', 'model.vqmodel.encoder.down.3.block.0.norm1.weight', 'model.vqmodel.encoder.down.3.block.0.norm2.bias', 'model.vqmodel.encoder.down.3.block.0.norm2.weight', 'model.vqmodel.encoder.down.3.block.1.conv1.bias', 'model.vqmodel.encoder.down.3.block.1.conv1.weight', 'model.vqmodel.encoder.down.3.block.1.conv2.bias', 'model.vqmodel.encoder.down.3.block.1.conv2.weight', 'model.vqmodel.encoder.down.3.block.1.norm1.bias', 'model.vqmodel.encoder.down.3.block.1.norm1.weight', 'model.vqmodel.encoder.down.3.block.1.norm2.bias', 'model.vqmodel.encoder.down.3.block.1.norm2.weight', 'model.vqmodel.encoder.down.3.downsample.conv.bias', 'model.vqmodel.encoder.down.3.downsample.conv.weight', 'model.vqmodel.encoder.down.4.block.0.conv1.bias', 'model.vqmodel.encoder.down.4.block.0.conv1.weight', 'model.vqmodel.encoder.down.4.block.0.conv2.bias', 'model.vqmodel.encoder.down.4.block.0.conv2.weight', 'model.vqmodel.encoder.down.4.block.0.nin_shortcut.bias', 'model.vqmodel.encoder.down.4.block.0.nin_shortcut.weight', 'model.vqmodel.encoder.down.4.block.0.norm1.bias', 'model.vqmodel.encoder.down.4.block.0.norm1.weight', 'model.vqmodel.encoder.down.4.block.0.norm2.bias', 'model.vqmodel.encoder.down.4.block.0.norm2.weight', 'model.vqmodel.encoder.down.4.block.1.conv1.bias', 'model.vqmodel.encoder.down.4.block.1.conv1.weight', 'model.vqmodel.encoder.down.4.block.1.conv2.bias', 'model.vqmodel.encoder.down.4.block.1.conv2.weight', 'model.vqmodel.encoder.down.4.block.1.norm1.bias', 'model.vqmodel.encoder.down.4.block.1.norm1.weight', 'model.vqmodel.encoder.down.4.block.1.norm2.bias', 'model.vqmodel.encoder.down.4.block.1.norm2.weight', 'model.vqmodel.encoder.mid.attn_1.k.bias', 'model.vqmodel.encoder.mid.attn_1.k.weight', 'model.vqmodel.encoder.mid.attn_1.norm.bias', 'model.vqmodel.encoder.mid.attn_1.norm.weight', 'model.vqmodel.encoder.mid.attn_1.proj_out.bias', 'model.vqmodel.encoder.mid.attn_1.proj_out.weight', 'model.vqmodel.encoder.mid.attn_1.q.bias', 'model.vqmodel.encoder.mid.attn_1.q.weight', 'model.vqmodel.encoder.mid.attn_1.v.bias', 'model.vqmodel.encoder.mid.attn_1.v.weight', 'model.vqmodel.encoder.mid.block_1.conv1.bias', 'model.vqmodel.encoder.mid.block_1.conv1.weight', 'model.vqmodel.encoder.mid.block_1.conv2.bias', 'model.vqmodel.encoder.mid.block_1.conv2.weight', 'model.vqmodel.encoder.mid.block_1.norm1.bias', 'model.vqmodel.encoder.mid.block_1.norm1.weight', 'model.vqmodel.encoder.mid.block_1.norm2.bias', 'model.vqmodel.encoder.mid.block_1.norm2.weight', 'model.vqmodel.encoder.mid.block_2.conv1.bias', 'model.vqmodel.encoder.mid.block_2.conv1.weight', 'model.vqmodel.encoder.mid.block_2.conv2.bias', 'model.vqmodel.encoder.mid.block_2.conv2.weight', 'model.vqmodel.encoder.mid.block_2.norm1.bias', 'model.vqmodel.encoder.mid.block_2.norm1.weight', 'model.vqmodel.encoder.mid.block_2.norm2.bias', 'model.vqmodel.encoder.mid.block_2.norm2.weight', 'model.vqmodel.encoder.norm_out.bias', 'model.vqmodel.encoder.norm_out.weight', 'model.vqmodel.post_quant_conv.bias', 'model.vqmodel.post_quant_conv.weight', 'model.vqmodel.quant_conv.bias', 'model.vqmodel.quant_conv.weight', 'model.vqmodel.quantize.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
VQModel loaded from ./ckpts/chameleon/tokenizer/vqgan.ckpt
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:8710 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
inference.sh: line 2: 172132 Floating point exception(core dumped) python inference_test.py
ChrisLiu6 commented 2 months ago

It looks more like some hardware/environment problem, unfortunately I may not be able to help