THUDM / ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Apache License 2.0
1.18k stars 65 forks source link

test custom model acc : 0.0% #62

Open Shiyao-Huang opened 1 year ago

Shiyao-Huang commented 1 year ago

need your help ~ please!

I have try a week to run this project, when I train 2 times of the IR model the figure show as follow:

image image

a little bit overfit ,but its ok to continue all the pipeline:

【 test script and RM not support custom .pt】

finally I change my code to fit test format as follow: but the answer is test1 Test Acc: 0.00%

【 this part I have no idea 】

import os
import torch
import json
from tqdm import tqdm
import ImageReward as RM
import argparse
from huggingface_hub import hf_hub_download
from ImageReward import ImageReward
model_path = '/home/xxx/download/ImageReward/train/checkpoint/blip_uni_cross_mul_bs8192_fix=0.7_lr=0.0001cosine/best_lr=0.0001.pt'
state_dict = torch.load(model_path, map_location='cpu')
download_root = '~/.cache/ImageReward'
device = "cuda:0"
med_config = ImageReward_download("https://huggingface.co/THUDM/ImageReward/blob/main/med_config.json", download_root or os.path.expanduser("~/.cache/ImageReward"))
model = ImageReward(device=device, med_config=med_config).to(device)
msg = model.load_state_dict(state_dict,strict=False)
print("checkpoint loaded")
model.eval()

model_type = 'test1'

def ImageReward_download(url: str, root: str):
    os.makedirs(root, exist_ok=True)
    filename = os.path.basename(url)
    download_target = os.path.join(root, filename)
    hf_hub_download(repo_id="THUDM/ImageReward", filename=filename, local_dir=root)
    return download_target

def acc(score_sample, target_sample):

    tol_cnt = 0.
    true_cnt = 0.
    for idx in range(len(score_sample)):
        item_base = score_sample[idx]["ranking"]
        item = target_sample[idx]["rewards"]
        for i in range(len(item_base)):
            for j in range(i+1, len(item_base)):
                if item_base[i] > item_base[j]:
                    if item[i] >= item[j]:
                        tol_cnt += 1
                    elif item[i] < item[j]:
                        tol_cnt += 1
                        true_cnt += 1
                elif item_base[i] < item_base[j]:
                    if item[i] > item[j]:
                        tol_cnt += 1
                        true_cnt += 1
                    elif item[i] <= item[j]:
                        tol_cnt += 1

    return true_cnt / tol_cnt

score_sample = []
with open('/home/xxxx/download/ImageReward/data/8k_group/test.json', "r") as f:
    score_sample = json.load(f)
target_sample = []
# bar = tqdm(range(len(score_sample)), desc=f'{model_type} ranking')
with torch.no_grad():
    for item in score_sample:
        img_list = [os.path.join('data/test_images', img) for img in item["generations"]]
        ranking, rewards = model.inference_rank(item["prompt"], img_list)

        target_item = {
            "id": item["id"],
            "prompt": item["prompt"],
            "ranking": ranking,
            "rewards": rewards
        }
        target_sample.append(target_item)
        # bar.update(1)

target_path = os.path.join('data/', f"test_{model_type}.json")
with open(target_path, "w") as f:
    json.dump(target_sample, f, indent=4, ensure_ascii=False)

test_acc = acc(score_sample, target_sample)
print(f"{model_type:>16s} Test Acc: {100 * test_acc:.2f}%")

and the test_test1.json


    {
        "id": "005658-0040",
        "prompt": "deathly portal to the abyss, ultra detailed, warm interior light, cinematic shot, photorealistic, octane render, high definition, fine details, sinister tones, 8 k, mcbess mood, ",
        "ranking": [
            2,
            1,
            3,
            4
        ],
        "rewards": [
            -0.1617799699306488,
            -0.1617799699306488,
            -0.1617799699306488,
            -0.1617799699306488
        ]
    },
    {
        "id": "005664-0153",
        "prompt": "cola made from cockroaches ",
        "ranking": [
            2,
            1,
            3,
            4
        ],
        "rewards": [
            -0.1617799699306488,
            -0.1617799699306488,
            -0.1617799699306488,
            -0.1617799699306488
        ]
    },
    {```
Shiyao-Huang commented 1 year ago

my train log :

Validation - Iteration 563 | Loss 1.08457 | Acc 0.6112
Iteration 564 | Loss 0.24259 | Acc 0.8965
Iteration 565 | Loss 0.22347 | Acc 0.9092
Iteration 566 | Loss 0.18962 | Acc 0.9307
Iteration 567 | Loss 0.20466 | Acc 0.9189
Iteration 568 | Loss 0.19291 | Acc 0.9209
Iteration 569 | Loss 0.22061 | Acc 0.9043
Iteration 570 | Loss 0.18802 | Acc 0.9229
Validation - Iteration 570 | Loss 1.08476 | Acc 0.6114
Iteration 571 | Loss 0.20536 | Acc 0.9189
Iteration 572 | Loss 0.21379 | Acc 0.9199
Iteration 573 | Loss 0.21193 | Acc 0.9053
Iteration 574 | Loss 0.21260 | Acc 0.9092
Iteration 575 | Loss 0.21147 | Acc 0.9072
Iteration 576 | Loss 0.19781 | Acc 0.9160
Iteration 577 | Loss 0.18155 | Acc 0.9219
Validation - Iteration 577 | Loss 1.08471 | Acc 0.6118
Iteration 578 | Loss 0.19942 | Acc 0.9219
Iteration 579 | Loss 0.20632 | Acc 0.9150
Iteration 580 | Loss 0.22403 | Acc 0.9111
Iteration 581 | Loss 0.19597 | Acc 0.9180
Iteration 582 | Loss 0.20380 | Acc 0.9199
Iteration 583 | Loss 0.20953 | Acc 0.9092
Iteration 584 | Loss 0.20941 | Acc 0.9141
Iteration 585 | Loss 0.19874 | Acc 0.9268
Validation - Iteration 585 | Loss 1.08473 | Acc 0.6112
training done
test: 
load checkpoint from checkpoint/blip_uni_cross_mul_bs8192_fix=0.7_lr=0.0001cosine/best_lr=0.0001.pt
missing keys: []
Test Loss 0.63277 | Acc 0.6440
xujz18 commented 1 year ago

Discussion is very welcome and I hope I can help. I noticed that your test_test1.json has exactly the same rewards, and it's reasonable to assume that it's probably the same image that was entered. You could switch to https://github.com/THUDM/ImageReward/blob/main/scripts/test.sh and try again with the test flow in there.