Closed rokopi-byte closed 7 months ago
You can convert the dataset from your original dataset with scores with the following sample code:
import math
toy_qa = [{"name": "img_001.png", "gt_score": 4.43}] # ensure your GT score in range [0,5]
score2level = {5: "excellent", 4: "excellent", 3: "good", 2: "fair", 1: "poor", 0: "bad"}
conv_template = [{"from": "human", "value": "How would you rate the quality of this image?\n<|image|>"}, {"from": "gpt", "value": "The quality of the image is {}."}]
toy_train_df = []
for toy_di in toy_qa: toy_train_di = {} toy_train_di["image"] = toy_di["name"] toy_train_di["conversations"] = conv_template toy_train_di["conversations"][-1]["value"] = toy_train_di["conversations"][-1]["value"].format(score2level[math.floor(toy_di["gt_score"])]) toy_train_df.append(toy_train_di)
print(toy_train_df)
The toy output shall be:
[{'image': 'img_001.png', 'conversations': [{'from': 'human', 'value': 'How would you rate the quality of this image?\n<|image|>'}, {'from': 'gpt', 'value': 'The quality of the image is excellent.'}]}]
why has two 'excellent' in score2level ?
score2level = {5: "excellent", 4: "excellent", 3: "good", 2: "fair", 1: "poor", 0: "bad"}
``
Hi, It's not clear to me how should I prepare a custom dataset for LoRA fine-tuning. I have only the images and was wondering how to properly annotate them. I see templates but it's not clear to me how to generate those files, specially for this part (example image sd1.5_highcorr_181.jpg in https://github.com/Q-Future/Q-Align/blob/main/playground/data/ft/agi/train_split_1.json):
Is there a script for generating this starting from the MOS of the original dataset?