If Train_loss = 0.00002 but Valid_loss = 5.81674, is model still learning?

clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019

Apache License 2.0

3.75k stars 1.1k forks source link

If Train_loss = 0.00002 but Valid_loss = 5.81674, is model still learning? #164

Open peeush-agarwal opened 4 years ago

peeush-agarwal commented 4 years ago

Hello, I've started training the model on custom dataset and found that after 10K iterations, Training loss has decreased to 0.00002 but validation loss is still on higher side and accuracy ~10%. I'm not really sure if model is still learning in further iterations. Could someone help me here, if I should continue the training or need to look in other parameters of model training? Thanks in advance

ku21fan commented 4 years ago

Hello,

I guess the training is almost done, thus continue training would be not a good solution. I recommend checking the difference between your custom dataset and the validation dataset. If they are very different, you should make your own validation dataset for your purpose.

Hope it helps, Best

iamrishab commented 4 years ago

Hello,

I guess the training is almost done, thus continue training would be not a good solution. I recommend checking the difference between your custom dataset and the validation dataset. If they are very different, you should make your own validation dataset for your purpose.

Hope it helps, Best

Hi @ku21fan First of all, thank you for this great work!

Similar to @peeush-agarwal problem, I am facing the same problem although I had made my custom train, validation and test dataset. The data distribution is also consistent in the training and validation set but still getting this: [10000/300000] Train loss: 0.00000, Valid loss: 3.39168, Elapsed_time: 1757.51917

Any suggestion is highly appreciated.

678098 commented 2 years ago

I have the same problem. The data is being generated similarly from the same distribution for train/validation like this:

trdg --output_dir train -b 2 -na 2 -ft fonts/fe_font.TTF -c 1000000 -rs -let -num -k 5 -rk -bl 1 -rbl --case upper
trdg --output_dir validate -b 2 -na 2 -ft fonts/fe_font.TTF -c 1000 -rs -let -num -k 5 -rk -bl 1 -rbl --case upper
# here whitespaces needs to be replaced by tabulations in labels.txt
python3 create_lmdb_dataset.py train train/labels.txt train_lmdb
python3 create_lmdb_dataset.py train validate/labels.txt validate_lmdb

678098 commented 2 years ago

So I just used similar train code from EasyOCR and it worked on my data. Don't know what is the problem with training here.

ashkanmradi commented 2 years ago

Hello, I also have the same problem. after 10k iteration, my train loss is almost 0, but the validation loss is about 1.5. My train data and validation data came from the same distribution and they are all 10-digit numbers with the same font. Anyone knows how to overcome this problem?