ByungKwanLee / MoAI

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
MIT License
311 stars 32 forks source link

question about prompt in training #18

Open cassiaaaaaa opened 5 months ago

cassiaaaaaa commented 5 months ago

Thank you for the awesome work! I have some interest to finetune on your model. But met problems about prompts. Are you following the old InternVL2 template to use [UNUSED_TOKEN_146] and [UNUSED_TOKEN_145] in the start and end of a question or an answer?

     dict(role='HUMAN',
          begin='[UNUSED_TOKEN_146]user\n', end='[UNUSED_TOKEN_145]\n'),
     dict(role='BOT', begin='[UNUSED_TOKEN_146]assistant\n',
          end='[UNUSED_TOKEN_145]\n', generate=True),
ByungKwanLee commented 5 months ago

Yes correct

cassiaaaaaa commented 5 months ago

Yes correct

Thanks a lot for answering my questions. I noticed the difference of prompt construction in the evaluation code and inference code. Then in training, when making the labels, should the [UNUSED_TOKEN_145] and [UNUSED_TOKEN_146] be marked or they take part in the loss computing?

for example :

AI assistant should give helpful and detailed answers to user after fully understanding an image. [UNUSED_TOKEN_146]user Does the drink appear to have milk as an ingredient? Answer the question using a single word or phrase.[UNUSED_TOKEN_145] [UNUSED_TOKEN_146]assistant No[UNUSED_TOKEN_145] [UNUSED_TOKEN_146]user Is the blender pitcher clear?[UNUSED_TOKEN_145] [UNUSED_TOKEN_146]assistant Yes[UNUSED_TOKEN_145] [UNUSED_TOKEN_146]user What brand is the blender?[UNUSED_TOKEN_145] [UNUSED_TOKEN_146]assistant Philips[UNUSED_TOKEN_145]

Is my format right? And is the characters in bold fond rightly the unmasked prompt?

ByungKwanLee commented 5 months ago

Definitely correct

cassiaaaaaa commented 5 months ago

Definitely correct

Thank you very much