Open neouyghur opened 4 years ago
I plotted your PSNR and l2 scores in a figure. It clearly shows your result is not consistent. Could you explain why? @naoto0804 did you get the same PSNR score? Thanks.
@neouyghur,Both PSNR and l2 scores are based on the average scores of the all test images.
@zhangshuaitao I am comparing my method with yours. I am also following the same protocol, however, my MSE and PSNR curves share the same trend. Besides that, as we know PSNR score is calculated based on the MSE score.
@zhangshuaitao is your l2 score is rmse or mse? thanks.
@neouyghur, l2 score is mse. We use the compare_mse and compare_ssim and compare_psnr functions in the skimage.measure module.
@zhangshuaitao First of all, thanks again for releasing the code and answering a lot of questions patiently.
However, what you say above seems to be inconsistent with the README.md;To evalution the model performace over a dataset, you can find the evaluation metrics in this website PythonCode.zip
..
Which is correct?
I would really appreciate it if you could produce the whole pipeline for evaluation?
It might be hard to follow the exactly same evaluation protocol, since some parameters for each function is unknown. (e.g., compare_ssim has some optional params, how did you set it? What's the range of values in images, 0.0\~1.0 or 0\~255?)
@naoto0804, Sorry for not explaining it clearly. we use AGE, pEPs, pCEPS in the PythonCode.zip. We use the compare_mse and compare_ssim and compare_psnr functions in the skimage.measure module. The default parameters for those functions is ok.
@zhangshuaitao Thank you so much for making it much more clear.
To make sure whether I followed your instruction exactly, I've computed all the metrics between all the original input/ground truth images in the test subset of the synthetic dataset. This is because I want to focus on the difference only in the evaluation phase, before reproducing the training phase.
The result is as follows; Do you think it's reasonable? If possible, could you compute it on your dataset and evaluation code? (I suspect there's still bugs in my implmentation, since these values are much better than the baseline
method in Table. 1)
mse 0.006965
ssim 0.933875
psnr 23.996012
AGE 5.851178
pEPs 0.064378
pCEPs 0.050264
@naoto0804 @zhangshuaitao I think your result is reasonable since only a small part of the scene is text. However, I felt they didn't fully train the baselines. With more training, the based line should get better results than they reported.
Hi, I am checking your results provided in Table1. I find the PSNR is not corresponding to l2. For example, 0.2465 l2 is corresponding to 25.60 PSNR, while 0.0627 l2 is corresponding to 24.83, and scene text easier l2 error is very high. Could you check this or could you offer your model for testing? Thanks.