Ahnsun / merlin

[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
https://ahnsun.github.io/merlin/
Other
78 stars 0 forks source link

how to launch local inference #3

Open cyj95 opened 2 months ago

cyj95 commented 2 months ago

I tried python -m mmgpt.engine.serve.cli --model-path ./mmgpt/engine/serve/merlin --image-file ./mmgpt/engine/serve/examples/waterview.jpg

but there is error

File "/merlin-main/mmgpt/engine/serve/cli.py", line 53, in main image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'].half().cuda() AttributeError: 'NoneType' object has no attribute 'preprocess'

Ahnsun commented 1 month ago

We apologize for the time constraints; we have not yet organized the code to support multi-round, multi-frame video demos. However, at this stage, we support single-round dialogues, and you can run it. We also provide some cases that you can follow to chat. CUDA_VISIBLE_DEVICES=0 torchrun --master_port=23425 mmgpt/engine/eval/eval_box.py \ --model_name_or_path /path/to/merlin-weights \ --vision_tower /path/to/clip-vit-large-patch14-448 \ --image_size 448 \ --model_max_length 4096 \ --image_aspect_ratio resize \ --projector conv \ --conv_stride 2 \ --bf16 True \ --output_dir ./output