parser.add_argument("--model_path", type=str, default="echo840/Monkey-Chat") #echo840/Monkey-Chat echo840/Monkey
...
if question == "Generate the detailed caption in English:" and "Monkey-Chat" not in checkpoint:
query = f'<img>{img_path}</img> Generate the detailed caption in English: ' #detailed caption
else:
query = f'<img>{img_path}</img> {question} Answer: ' #VQA
Yes, during the training of Monkey, we used different prompts for detailed captioning and VQA. However, for Monkey-Chat training, we standardized the prompts.
In
inference.py
Is it just the prompt that is different?