Q-Future / Q-Align

③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
https://q-align.github.io
Other
293 stars 19 forks source link

About the precision of the conversion of ava dataset #1

Closed Qiulin-W closed 10 months ago

Qiulin-W commented 10 months ago

Hi, great work!

I have a question about the precision of the conversion in Table2 of the paper. The reported plcc/srcc after "quantization" is 0.920/0.930, but when I tried to reproduce the result, I got 0.881/0.845. The code is as follows:

import json                                                                     
from scipy import stats                                                         

with open('./playground/data/training_sft/train_ava.json', 'r') as file:                                                                                 
    ava_train_list = json.load(file)                                                                                      

x = []                                                                                                                    
y = []                                                                                                                    

min_s, max_s = float("+inf"), float("-inf")                                                                               
for xx in ava_train_list:                                                                                   
    score = float(xx["gt_score"])                                                                                                                                    
    min_s = min(min_s, score)                                                                                             
    max_s = max(max_s, score)                                                                                             

for xx in ava_train_list:                                                                                                                                                                                          
    score = float(xx["gt_score"])                                                                                         
    x.append(score)                                                                                                       
    norm_score = (score - min_s) / (max_s - min_s)                                                                                                                                                                        
    if norm_score <= 0.2:                                                                                                 
        y.append(1)                                                                                                       
    elif norm_score <= 0.4:                                                                                               
        y.append(2)                                                                                                       
    elif norm_score <= 0.6:                                                                                               
        y.append(3)                                                                                                       
    elif norm_score <= 0.8:                                                                                               
        y.append(4)                                                                                                       
    else:                                                                                                                 
        y.append(5)                                                                                                       

plcc, srcc = stats.pearsonr(x, y)[0], stats.spearmanr(x, y)[0]                                                            
print("plcc: {}, srcc: {}".format(plcc, srcc))

Is there anything wrong with the code? Thanks!

teowu commented 10 months ago

Good question. Have you tried the "already converted" texts in this json? That should be with correct results.

In AVA, to calculate the min max, we remove 0.1% outliers of each dataset. About why they are outliers...you may try to plot the scores and see the min and max.

Hope that clarifies.

Best Teo