Great job! Does it support fine-tuning Chinese data

Yes, it supports.

The data should be a list, each element in the format like:

{
'id': '0', 
'image': 'coco_2017/000000337760.jpg', 
'conversations': [
    {'from': 'human', 'value': '<image>\n这是一辆现代消防车吗？\n请用单个短语回答问题。'}, 
    {'from': 'gpt', 'value': '不'}, 
    {'from': 'human', 'value': '这张照片是黑白的还是彩色的？'}, 
    {'from': 'gpt', 'value': '黑白的'},  
    {'from': 'human', 'value': '图中显示了哪些车辆？'}, 
    {'from': 'gpt', 'value': '消防车'}
]
}

The performance is related to base LLM and tokenizer.

When evaluating, you may also need to edit the prompt.

BAAI-DCAI / Bunny

Great job! Does it support fine-tuning Chinese data #7