Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
MIT License
1.77k stars 122 forks source link

What is the difference between monkey and monkey-chat? #102

Closed w-qhai closed 2 months ago

w-qhai commented 2 months ago

In inference.py

    parser.add_argument("--model_path", type=str, default="echo840/Monkey-Chat") #echo840/Monkey-Chat  echo840/Monkey
    ...
    if question == "Generate the detailed caption in English:" and "Monkey-Chat" not in checkpoint:
        query = f'<img>{img_path}</img> Generate the detailed caption in English: ' #detailed caption
    else:
        query = f'<img>{img_path}</img> {question} Answer: ' #VQA

Is it just the prompt that is different?

echo840 commented 2 months ago

Yes, during the training of Monkey, we used different prompts for detailed captioning and VQA. However, for Monkey-Chat training, we standardized the prompts.