Open ch-andrei opened 2 years ago
I have the same question, especially when I test the whole TID2013 dataset with the code 'test.py'. The results are totally unsatisfactory and the PLCC, and SRCC scores are very low. Can anyone teach me how to correctly test the results?
Question for the results in the original paper: How is the evaluation on traditional datasets (LIVE, CSIQ, TID) performed? Do you report average performance over K runs? The paper only mentions that datasets are split 60-20-20 train/val/test. Please add a more detailed description.