为何评估结果跟论文中的不一致?

dpengwen commented 5 years ago

您好, 非常感谢你开源的代码以及训练好的模型! 我用您训练的模型以及demo.py评估IIIT5K时(没有lexicon), 发现word accuracy只有83.7%, 当我只考虑IIIT5K中的全为字符并且至少包含3个字符的word时, word-accuracy =86.13%, 跟论文中的86.7%不一致. 想请问一下, 您的开源的模型是在Syn90k+SynText上训练的吗? 评估iiit5k时,只考虑IIIT5K中的全为字符并且至少包含3个字符的word吗?

FangShancheng commented 5 years ago

您好，

根据您提出的问题，我复现了您在demo.py上的评估，并与eval_one_pass.sh脚步评估的accuracy做对比，发现demo.py部分的代码将导致accuracy降低。其主要的原因是demo.py部分图像resize的插值的方法采用NEAREST的方式，而训练阶段图像resize插值用的方法是INTER_LINEAR。事实上，训练阶段使用的是opencv 默认的resize方法，而demo.py中采用的是PIL默认的resize方法。我们将修改demo.py中的插值方法为BILINEAR，其导致在IIIT上的精度更高。

此外，该公布的模型为近期重新训练的模型，与论文中模型报告精度的基本相同，不排除训练的迭代次数及学习率不同带了的一定偏差。经评估，该模型在IIIT及SVT上效果稍好于论文中报告的数据，ICDAR上与论文基本中相同。预训练模型采用的数据集为Syn90k+SynText，评估为字母数字字符及三个字符以上的word，训练及评估的方法详细细节请参考论文。另外，demo.py中的beam width为1而论文中的为5，但在我们的模型中发现beam width对accuracy影响不大。

FangShancheng commented 5 years ago

Hi @dpengwen,

we found a problem causing lower accuracy lies in demo.py according to your report. Our pretrained model is trained using opencv resize (default interpolation=CV_INTER_LINEAR) method to preprocess images. However, the demo.py use PIL resize method (default interpolation=NEAREST), which will degrade the accuracy in IIIT dataset. We will change the interpolation method to BILINEAR in demo.py, as shows a better accuracy in IIIT dataset.

Actually, we have provided eval_one_pass.sh and tfrecord example tool to evaluate datasets, which is better than customizing demo.py.

In addition, the pretained model we trained recently is basically the same as the model reported in our paper. But still, different training iteration and learning rate will bring subtle difference. Refer to our paper for more training and evaluation details. Thanks.

dpengwen commented 5 years ago

非常感谢您的详细解答,目前我刚开始训起来,发现loss波动比较大,可否将你们训练的loss变化和share出来, 以作参考?

FangShancheng commented 5 years ago

非常感谢您的详细解答,目前我刚开始训起来,发现loss波动比较大,可否将你们训练的loss变化和share出来, 以作参考?

训练早期有一定波动是正常的，不同的数据集训练情况也可能有一定区别，如果波动过大至难以收敛，可以通过调整lr以及clip gradient值调整，建议您持续训练下去再观测试试。

dpengwen commented 5 years ago

嗯嗯,非常感谢.我训练起来发现训练还是很慢的, bach_size=128,在一块TitanX 上训练的, 比如: INFO:tensorflow:step = 101, loss = 43.823254 (108.595 sec) INFO:tensorflow:global_step/sec: 0.724205 INFO:tensorflow:step = 201, loss = 41.28892 (138.089 sec) 不知这是否正常? 为何还没有LSTM快呢?

FangShancheng commented 5 years ago

不同机器训练速度都不相同。检查下是否有IO瓶颈，用SSD不存在数据读取的花销。此外，在特征提取网络计算量（下采样）、参数量以及batch size不同的情况下，对比速度没有意义。

dpengwen commented 5 years ago

好的.谢谢您

CXY573 commented 5 years ago

@dpengwen 你好，请问你目前训练结果如何？

FangShancheng / conv-ensemble-str

为何评估结果跟论文中的不一致? #2