MhLiao / DB

A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
2.1k stars 479 forks source link

Accuracy discussion for ICDAR 2015 dataset #186

Open xisi789 opened 4 years ago

xisi789 commented 4 years ago

height:1152,得到精度如下: image

训练指令: CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py /storage03/users/xisi/code/ocr/detection/DB/experiments/seg_detector/ic15_resnet50_deform_thre.yaml --num_gpus 4 --resume /storage03/users/xisi/code/ocr/detection/DB/models/pre-trained-model-synthtext-resnet50

yaml配置: image

MhLiao commented 4 years ago

@xisi789 请问你复现其他数据集有出现问题吗?还是只有ic15的复现有问题?我想由此分析一下是代码或者环境的原因还是初始化等其他因素的原因。

xisi789 commented 4 years ago

@MhLiao 之前没训练过其他数据集,我今天在TD500 上训练一下resnet50,晚点给您答复。

xisi789 commented 4 years ago

TD500 resnet50的精度是对的 image

训练指令: CUDA_VISIBLE_DEVICES=0,1 python train.py /storage03/users/xisi/code/ocr/detection/DB/experiments/seg_detector/td500_resnet50_deform_thre.yaml --num_gpus 2 --resume /storage03/users/xisi/code/ocr/detection/DB/models/pre-trained-model-synthtext-resnet50

yaml配置: image

您觉得是什么原因造成icdr15的精度差别如此大呢?我修改了batchsize和num_workers,或者是训练时gpu数目不同导致的?还有我想咨询一下我的自有训练集大概有一万张图片,如何调整batchsize、epochs和lr比较好呢?期待您的回复! @MhLiao

MhLiao commented 4 years ago

@xisi789 num_workers应该是没有影响的。 batchsize和gpu数目可能会有影响(每张卡的batch size会影响batch norm,总的batch size跟lr有关联)。数据集规模不一样的话,主要应该修改epochs,建议参考对比loss收敛曲线。

xisi789 commented 4 years ago

@MhLiao 非常感谢您的回答!不过我不太明白参考对比loss曲线是什么意思。比如我看公开数据集上loss能够降到0.1以下,而我本地数据集在0.9左右,这种是改增大还是减小epchos呢?或者说看其他的东西?

xisi789 commented 4 years ago

@MhLiao 使用您默认的GPU数目及batchsize重新训练resnet50,在ic15上仍然达不到描述精度。 height:736,得到精度如下: image height:1152,得到精度如下: image

训练指令: CUDA_VISIBLE_DEVICES=0,1 python train.py /storage03/users/xisi/code/ocr/detection/DB/experiments/seg_detector/td500_resnet50_deform_thre.yaml --num_gpus 2 --resume /storage03/users/xisi/code/ocr/detection/DB/models/pre-trained-model-synthtext-resnet50

yaml文件: image

MhLiao commented 4 years ago

@xisi789 能把你训练好的模型发给我吗?

xisi789 commented 4 years ago

@xisi789 能把你训练好的模型发给我吗?

@MhLiao 可以啊,我怎么发给您呢?

MhLiao commented 4 years ago

@xisi789 可以通过网盘分享,谷歌百度均可。

xisi789 commented 4 years ago

@MhLiao 模型文件在这里,链接:https://pan.baidu.com/s/1IEUjqcGlb25a2rfJM_mG_A 提取码:ybus

Luowenli1996 commented 4 years ago

@MhLiao和@xisi789 你们好,我也在默认的块数和batchsize下得到了差不多的结果。 height:1152,得到精度如下: image

期待能找到解决方法。感谢

shaohailin commented 4 years ago

我之前根据作者提供的代码跑了很多次,在icdar2015数据集上,resnet50 high 736 训练命令CUDA_VISIBLE_DEVICES=2,3 python train.py experiments/seg_detector/ic15_resnet50_deform_thre.yaml --resume ModelsGoogle/pre-trained-model-synthtext-resnet50 --num_gpus 2 测试命令 图片 yaml文件与作者一致,没改过 图片

多次实验结果如下: 图片

我的实验结果均比您公开的结果低2个点左右。 @MhLiao @xisi789 @Luowenli1996 @xuannianz 在其他两个数据集上结果与公开结果几乎一致 图片

MhLiao commented 4 years ago

Hello, @xisi789 @shaohailin @Luowenli1996 I guess I find the problem. You should modify this line as the latest commit, which is added for the cases in MSRA-TD500. But this action may hurt the ICDAR 2015 dataset. Here is the performance with 1152 height for the model I re-trained. (The speed is abnormal since the GPU also runs another process.) image

Another suggestion is to use MLT pre-trained model instead of SynthText pre-trained model, making it easier to achieve the targeted accuracy, as suggested in some previous works.

Luowenli1996 commented 4 years ago

Thank you very much for your reply @MhLiao. I modified that line according to your solution .Here is the performance with 1152 height for the model I re-trained. I changed the number of epoch from 1200 to 800. image Do I need other changes.I will try again.

xisi789 commented 4 years ago

Hello, @xisi789 @shaohailin @Luowenli1996 I guess I find the problem. You should modify this line as the latest commit, which is added for the cases in MSRA-TD500. But this action may hurt the ICDAR 2015 dataset. Here is the performance with 1152 height for the model I re-trained. I changed the number of epoch from 1200 to 800. (But I think this is not sensitive) (The speed is abnormal since the GPU also runs another process.) image

Another suggestion is to use MLT pre-trained model instead of SynthText pre-trained model, making it easier to achieve the targeted accuracy, as suggested in some previous works.

@MhLiao @Luowenli1996 我也按照这个要求改了并重新训练,仍然达不到相同精度: image

shaohailin commented 4 years ago

作者您好!ICDAR2015 resnet50 height736模型 您现在结果是多少?能达到论文里面的F值85.4吗?还是依然在82~83之间徘徊 @MhLiao @xisi789 @Luowenli1996 @xuannianz

MhLiao commented 4 years ago

@shaohailin @xisi789 @Luowenli1996 试试MLT的pretrain model?我刚才又试了1200 epoch,结果是相似的,在736尺度更稳定。

xisi789 commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

xisi789 commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

xisi789 commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

xisi789 commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

xisi789 commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

MhLiao commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

改回1200 epoch再试试吧,我这边是可以的。

Luowenli1996 commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

改回1200 epoch再试试吧,我这边是可以的。

请问,在ic17-MLT上也是1200epoch么?需要imagenet resnet50模型作为预训练模型吗?

shaohailin commented 4 years ago

请问现在的测试命令 736大小 阈值是多少 ICdar2015 @MhLiao

MhLiao commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

改回1200 epoch再试试吧,我这边是可以的。

请问,在ic17-MLT上也是1200epoch么?需要imagenet resnet50模型作为预训练模型吗?

ic17-MLT上不需要1200,预计400左右应该就可以了。需要imagenet resnet50模型作为预训练模型

MhLiao commented 4 years ago

请问现在的测试命令 736大小 阈值是多少 ICdar2015 @MhLiao

每个人训练出来的模型可能不一样,大概范围是0.5~0.6

Luowenli1996 commented 4 years ago

我使用的就是MLT的预训练模型 @MhLiao

改回1200 epoch再试试吧,我这边是可以的。

请问,在ic17-MLT上也是1200epoch么?需要imagenet resnet50模型作为预训练模型吗?

ic17-MLT上不需要1200,预计400左右应该就可以了。需要imagenet resnet50模型作为预训练模型

非常感谢您的回复!

MhLiao commented 4 years ago

我在 MLT pre-trained model 预训练 height736 阈值0.55 1200epoch 仍然达不到论文里面的85.4 图片

@MhLiao @xisi789

@shaohailin 这是我MLT pre-trained model训练出来的模型,height 736,阈值0.6,F-measure 85.3,与85.4只差0.1个点。 image

lmw0320 commented 4 years ago

@MhLiao 大佬,想请教下,我跑IC17-MLT数据集的时候,遇到cudnn error的报错,这个是怎么回事?? 另外,我看模型训练的时候,设置的图片大小,会与验证、测试时候设置的图片大小不一致,这个让我感觉挺迷糊的。。一直以来我都以为,训练阶段设置的图片大小,应该与验证、测试阶段的图片大小一致。我是不太理解,为什么要把这些数值设置成不太一致的情况?

MhLiao commented 4 years ago

@MhLiao 大佬,想请教下,我跑IC17-MLT数据集的时候,遇到cudnn error的报错,这个是怎么回事?? 另外,我看模型训练的时候,设置的图片大小,会与验证、测试时候设置的图片大小不一致,这个让我感觉挺迷糊的。。一直以来我都以为,训练阶段设置的图片大小,应该与验证、测试阶段的图片大小一致。我是不太理解,为什么要把这些数值设置成不太一致的情况?

@lmw0320 cudnn error应该跟数据集无关吧,这类错误应该能搜到解决方案。训练阶段为了充分并行,加上有random crop数据增强,所以用了640*640。

ML-Mr-J commented 4 years ago

@MhLiao 非常感谢您的回答!不过我不太明白参考对比loss曲线是什么意思。比如我看公开数据集上loss能够降到0.1以下,而我本地数据集在0.9左右,这种是改增大还是减小epchos呢?或者说看其他的东西?

同问,MTWI2018数据集loss,0.6~0.7,召回57左右,不知道问题出在哪里。

Luowenli1996 commented 4 years ago

@MhLiao Hello,I have some questions about the TD500 dataset to ask you. I can’t get the results in the paper. Do I need to process the 'difficult' label of the dataset into the form of '###' like ICDAR2015? Is it necessary for both training and testing? If I have bad English, please understand。look forward to your reply.

Luowenli1996 commented 4 years ago

@MhLiao Hello,I would like to ask you why FPS is not as high as paper. I have set adaptive to False in the yaml file ,but FPS is not change. Look forward to your reply.

Luowenli1996 commented 4 years ago

@shaohailin @xisi789 Hello,I have some questions about the TD500 dataset to ask you. I can’t get the results in the paper. Do I need to process the 'difficult' label of the dataset into the form of '###' like ICDAR2015? Is it necessary for both training and testing? If I have bad English, please understand。look forward to your reply.

shaohailin commented 4 years ago

我只做了resnet50 td500的实验,论文里p91.5 r79.2 f84.9 我自己跑的结果是p89.0 r80.8 f84.7 我的结果与论文差不多

@Luowenli1996

Luowenli1996 commented 4 years ago

就是TD500数据集,您是怎么处理 difficult 标签的呢? ------------------ 原始邮件 ------------------ 发件人: "MhLiao/DB" <notifications@github.com>; 发送时间: 2020年10月21日(星期三) 上午9:54 收件人: "MhLiao/DB"<DB@noreply.github.com>; 抄送: "小栗子"<1024911964@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [MhLiao/DB] Accuracy discussion for ICDAR 2015 dataset (#186)

我只做了resnet50 td500的实验,论文里p91.5 r79.2 f84.9 我自己跑的结果是p89.0 r80.8 f84.7 我的结果与论文差不多

@Luowenli1996

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

xisi789 commented 4 years ago

@shaohailin @xisi789 Hello,I have some questions about the TD500 dataset to ask you. I can’t get the results in the paper. Do I need to process the 'difficult' label of the dataset into the form of '###' like ICDAR2015? Is it necessary for both training and testing? If I have bad English, please understand。look forward to your reply.

我是使用TD500直接训练的模型,没有做任何预处理。@Luowenli1996

Luowenli1996 commented 4 years ago

您好,不介意的话能私发我一份TD500训练和测试的GT么?@xisi789

xisi789 commented 4 years ago

您好,不介意的话能私发我一份TD500训练和测试的GT么?@xisi789

是直接官方下载下来的TD500,你不需要对数据集进行任何改动。我的数据集跟你的没啥不一样。

Luowenli1996 commented 4 years ago

@xisi789 不好意思,再问下,这个TD500下载下来的数据集不是4个点标注的形式?您没有处理成4个点标注的形式吗?

xisi789 commented 4 years ago

@xisi789 不好意思,再问下,这个TD500下载下来的数据集不是4个点标注的形式?您没有处理成4个点标注的形式吗?

opencv函数自带外接矩形。。。。建议你找找代码或者环境原因。数据出问题的可能性不大

yfw1226 commented 3 years ago

@xisi789 请问您找到ic15精度低的原因了吗,我在ic15复现结果也比论文低2个点

xisi789 commented 3 years ago

@xisi789 请问您找到ic15精度低的原因了吗,我在ic15复现结果也比论文低2个点

我没找到,尝试了很多方法,一直都没论文精度高

yfw1226 commented 3 years ago

您好,不介意的话能私发我一份TD500训练和测试的GT么?@xisi789

是直接官方下载下来的TD500,你不需要对数据集进行任何改动。我的数据集跟你的没啥不一样。

我尝试用td500官方的数据集直接训练,在image_dataset.py里debug发现无法正确读出标注信息,这里的函数还是按照x1,y1,x2,y2,......, label的格式读取的标注,但是td500官方给的标注和这个不一样呀,请问您是怎么处理的呢

WayoSunny commented 3 years ago

Hello, @xisi789 @shaohailin @Luowenli1996 I guess I find the problem. You should modify this line as the latest commit, which is added for the cases in MSRA-TD500. But this action may hurt the ICDAR 2015 dataset. Here is the performance with 1152 height for the model I re-trained. I changed the number of epoch from 1200 to 800. (But I think this is not sensitive) (The speed is abnormal since the GPU also runs another process.) image Another suggestion is to use MLT pre-trained model instead of SynthText pre-trained model, making it easier to achieve the targeted accuracy, as suggested in some previous works.

@MhLiao @Luowenli1996 我也按照这个要求改了并重新训练,仍然达不到相同精度: image

您好,请问应该怎么改?要去掉“if 'TD' in self.data_dir[0] and label == '1':”这行代码吗?

johnsonkee commented 3 years ago

Hello, @xisi789 @shaohailin @Luowenli1996 I guess I find the problem. You should modify this line as the latest commit, which is added for the cases in MSRA-TD500. But this action may hurt the ICDAR 2015 dataset. Here is the performance with 1152 height for the model I re-trained. I changed the number of epoch from 1200 to 800. (But I think this is not sensitive) (The speed is abnormal since the GPU also runs another process.) image Another suggestion is to use MLT pre-trained model instead of SynthText pre-trained model, making it easier to achieve the targeted accuracy, as suggested in some previous works.

@MhLiao @Luowenli1996 我也按照这个要求改了并重新训练,仍然达不到相同精度: image

您好,请问应该怎么改?要去掉“if 'TD' in self.data_dir[0] and label == '1':”这行代码吗?

作者应该是说把代码更新到“if 'TD' in self.data_dir[0] and label == '1':”这样,也就是说咱们不用改吧。从上面看下来,貌似只剩下采用MLT的预训练模型这一条路了,看看是否能训回论文中的点

lurmos commented 3 years ago

到目前为止,请问有人在ICDAR2015数据集上复现了论文的实验结果了吗?

xisi789 commented 3 years ago

@lurmos 目前为止我本人没有在ICDAR2015数据集上复现出结果

xisi789 commented 3 years ago

@johnsonkee 请问你复现出了论文结果吗