BAAI-DCAI / Bunny

A family of lightweight multimodal models.
Apache License 2.0
866 stars 65 forks source link

Great job! Does it support fine-tuning Chinese data #7

Closed yazheng0307 closed 6 months ago

Isaachhh commented 6 months ago

Yes, it supports.

The data should be a list, each element in the format like:

{
'id': '0', 
'image': 'coco_2017/000000337760.jpg', 
'conversations': [
    {'from': 'human', 'value': '<image>\n这是一辆现代消防车吗?\n请用单个短语回答问题。'}, 
    {'from': 'gpt', 'value': '不'}, 
    {'from': 'human', 'value': '这张照片是黑白的还是彩色的?'}, 
    {'from': 'gpt', 'value': '黑白的'},  
    {'from': 'human', 'value': '图中显示了哪些车辆?'}, 
    {'from': 'gpt', 'value': '消防车'}
]
}

The performance is related to base LLM and tokenizer.

When evaluating, you may also need to edit the prompt.

yazheng0307 commented 6 months ago

thanks!!!