clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.75k stars 1.1k forks source link

Accuracy difference between local retraining model and pretrained one #56

Closed 1LOVESJohnny closed 5 years ago

1LOVESJohnny commented 5 years ago

First, thanks for your great work :) ! You've done a good job!

Here's my question, I've retrained the model with the option as: "--select_data MJ-ST --batch_ratio 0.5-0.5 --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC" , corresponding to the original version of CRNN. The rest parameters are set as default and the model is trained on MJ and ST datasets.

However, when testing with my local retrained best_accuracy model, the result accuracy is shown as below: in IC13_857: only 88.45% while 91.1% in paper. in IC13_1015: 87.68% while 89.2% in paper. in IC15_1811: 66.37% while 69.4% in paper. in IC15_2077: 64.07% while 64.2% in paper.

It seems like there is still something inappropriate in my retraining process. Should I reset the learning rate or expand my training iteration? Do you guys have any idea about improving the performance to align with the public results illustrated in the paper?

And I've attempted to train only on MJ dataset, whose model seems to have a higher accuracy in IC13_857. When I extend the training on both MJ and ST, is it necessary to add up the iteration number, so that I can get a better accuracy?

Expect for your reply ^_^

ku21fan commented 5 years ago

Hello,

I guess 2 things would make a difference.

  1. In our paper, we have run five trials with different initialization random seeds and have averaged their accuracies. Try different '--manualSeed' And how about result accuracies on other datasets? There are 10 datasets, rest 6 datasets were fine? In our experiments, the result accuracies of some models fluctuated depending on initialization.

  2. In our paper, we used --select_data / --batch_ratio 1 which means we did not modulate data ratios in the batch. (except varying dataset size experiments.) In this repository, We set --select_data MJ-ST --batch_ratio 0.5-0.5 as default because this option performed better in many cases. This is an additional option to modulate data ratios in the batch.

"Should I reset the learning rate or expand my training iteration?" -> learning rate schedule technique works well and expanding training iteration would also work.

Hope it helps. Best

1LOVESJohnny commented 5 years ago

Hello,

I guess 2 things would make a difference.

  1. In our paper, we have run five trials with different initialization random seeds and have averaged their accuracies. Try different '--manualSeed' And how about result accuracies on other datasets? There are 10 datasets, rest 6 datasets were fine? In our experiments, the result accuracies of some models fluctuated depending on initialization.
  2. In our paper, we used --select_data / --batch_ratio 1 which means we did not modulate data ratios in the batch. (except varying dataset size experiments.) In this repository, We set --select_data MJ-ST --batch_ratio 0.5-0.5 as default because this option performed better in many cases. This is an additional option to modulate data ratios in the batch.

"Should I reset the learning rate or expand my training iteration?" -> learning rate schedule technique works well and expanding training iteration would also work.

Hope it helps. Best

Hi, thanks a lot for your reply ^_^.

I've tested this model in the rest 6 test sets and their accuracy are listed as below: in SVT: 80.68%, while 81.6% in the paper in CUTE: 60.28%, while 65.5% in the paper in IC03_860: 91.28%, while 93.1% in the paper in IC03_867: 90.5%, while 92.6% in the paper in SVTP: 65.58%, while 70.0% in the paper in IIIT: 82.6%, while 82.9 in the paper

It seems like my local trained model have an average lower performance than the public result. Do you have any idea about the reasons that may influence the training process? I'd be very appreciated ...

"we have run five trials with different initialization random seeds." -> Could you please tell these five numbers of random seeds? I've tried to retrain the model with --manualSeed 1113, whose performance are improved, listed as below:

in IC13_857: only 90.43% while 91.1% in paper. in IC13_1015: 89.06% while 89.2% in paper. in IC15_1811: 67.20% while 69.4% in paper. in IC15_2077: 64.83% while 64.2% in paper. (higher) in SVT: 82.38%, while 81.6% in the paper. (higher) in CUTE: 62.72%, while 65.5% in the paper in IC03_860: 92.56%, while 93.1% in the paper in IC03_867: 92.04%, while 92.6% in the paper in SVTP: 68.22%, while 70.0% in the paper in IIIT: 81.83%, while 82.9 in the paper

Changing the number of random seeds is efficient for accuracy improvement. But there's still some distance to achieve the public results. Hope for your reply ^^

ku21fan commented 5 years ago

-> Could you please tell these five numbers of random seeds? Yes. They are 1111, 1112, 1113, 2222, 2223

This is the result of our pretrained model (None-VGG-BiLSTM-CTC) IIIT5k_3000: 82.733
SVT: 82.380
IC03_860: 92.791
IC03_867: 93.080
IC13_857: 90.782
IC13_1015: 89.261
IC15_1811: 68.415
IC15_2077: 65.869
SVTP: 70.853
CUTE80: 62.718

Your average result seems 1~2% lower than ours, and I guess something wrong would happen. Could you upload your opt.txt and log_train.txt to take a look?

I'm trying to reproduce this model with the default code.

1LOVESJohnny commented 5 years ago

-> Could you please tell these five numbers of random seeds? Yes. They are 1111, 1112, 1113, 2222, 2223

This is the result of our pretrained model (None-VGG-BiLSTM-CTC) IIIT5k_3000: 82.733 SVT: 82.380 IC03_860: 92.791 IC03_867: 93.080 IC13_857: 90.782 IC13_1015: 89.261 IC15_1811: 68.415 IC15_2077: 65.869 SVTP: 70.853 CUTE80: 62.718

Your average result seems 1~2% lower than ours, and I guess something wrong would happen. Could you upload your opt.txt and log_train.txt to take a look?

I'm trying to reproduce this model with the default code.

Thank you so much for your reply! I'll try these initialization.

Of course, the opt.txt and log_train.txt are uploaded, in which manualSeed is set as 1111. opt.txt log_train.txt

1LOVESJohnny commented 5 years ago

Your average result seems 1~2% lower than ours, and I guess something wrong would happen. Could you upload your opt.txt and log_train.txt to take a look?

I'm trying to reproduce this model with the default code.

Hi, sorry for interrupt. Have you reproduced the model accuracy? Expect for your reply ^_____^

ku21fan commented 5 years ago

@1LOVESJohnny Thanks to you, I reproduced both (1% lower one and our original pretrained one) and I found some weird behaviors of CTCloss, which make a difference in accuracy. I will describe it about 2 hours later.

1LOVESJohnny commented 5 years ago

@1LOVESJohnny Thanks to you, I reproduced both (1% lower one and our original pretrained one) and I found some weird behaviors of CTCloss, which make a difference in accuracy. I will describe it about 2 hours later.

Wow, thank you! Take your time.

ku21fan commented 5 years ago

@1LOVESJohnny First of all, I have seen that the behaviors of CTCLoss in PyTorch 1.1.0 and 1.2.0 (or other versions) are different.

The model trained with PyTorch 1.2.0 and our current code ([A] below code, GPU calculation) has 1 % lower accuracy (total accuracy is about 78.3).

[A]
#criterion / preds / text / pred_size / length are cuda tensor
            torch.backends.cudnn.enabled = False
            cost = criterion(preds, text, preds_size, length)
            torch.backends.cudnn.enabled = True

The model trained with PyTorch 1.1.0 and our previous code [B] has close accuracy of our pretrained model (total accuracy is above 79.3%).

[B]
#criterion / preds are cuda tensor
            cost = criterion(preds, text, preds_size, length)

and the result of the CTC model in PyTorch 1.1.0 seems to be more fluctuate but better accuracy than PyTorch 1.2.0. Strangely, when I try to train with PyTorch 1.2.0 and code [B], the training loss went to NAN and could not train the model..

I also found calculating CTCLoss in CPU and in GPU have different results, and in CPU has better accuracy (total accuracy is about 78.7). i.e.

            cost = criterion(preds.cpu(), text.cpu(), preds_size.cpu(), length.cpu())

would make a different result.

The experiments in our paper were performed with PyTorch 0.4.1 and warp-ctc and our pretrained model is trained with PyTorch 1.1.0. So, if you want to reproduce the result of CTC modules now, you should use these environments.

Currently, I am finding the reason and a way which is compatible with PyTorch 1.2.0 without accuracy decay.

1LOVESJohnny commented 5 years ago

Hi, @ku21fan Thanks so much for your reply! And all these reproducing experiments!

It seems like it's caused by different implementations. I'll try to retrain the model in PyTorch 1.1.0.

1LOVESJohnny commented 5 years ago

Hi @ku21fan , I'm surprised to find out that my pytorch is already of version 1.1.0. But the CUDA version is different from yours, mine is 10.0.

I'm retraining the model with PyTorch 1.1.0 and CUDA 9.0.45. I don't know whether it's the problem ...

ku21fan commented 5 years ago

@1LOVESJohnny Sorry for the late reply. Use this code (ctc_b) with PyTorch 1.1.0. (and comment out (ctc_a)) With this code, the training would end with best_accuracy (on valid data) about 77%.

And the performance of the final model would be similar to our paper and our pretrained model. (I was struggling to find a way that uses PyTorch 1.2.0 without accuracy decay, but I could not find.. I hope that accuracy decay fixed in the next patch.)

Hope it helps Best