Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
574 stars 147 forks source link

progress stop at 99.99% of this epoch #15

Open qnkhuat opened 6 years ago

qnkhuat commented 6 years ago
screen shot 2018-03-18 at 6 23 40 pm

its stayed like this for 10m. Is there any problem with it ?

Bartzi commented 6 years ago

No, there is no problem. The program is performing a validation of your trained model on the entire validation dataset. This will take a while. Everything is alright.

Bartzi commented 6 years ago

I'm also seeing an nan in your train log. Did you adjust the learning rate to a lower value?

qnkhuat commented 6 years ago

I didn't change anything. btw I encourage the same problem in another remote. the whole losses is nan.

screen shot 2018-03-19 at 9 14 13 pm
Bartzi commented 6 years ago

In this case you definitely need to adjust to learning rate to 1e-4 or 1e-5.

qnkhuat commented 6 years ago

it didn't help. I still receive nan

Bartzi commented 6 years ago

it could be that a division by zero occurs somewhere... If adjusting the learning rate does not help, you could check for that and use chainer in debug mode.

qnkhuat commented 6 years ago

Its yielded : Exception in main training loop: Each label t need to satisfy 0 <= t < x.shape[1] or t == -1; Concretely:

screen shot 2018-03-21 at 12 42 19 am

It is funny that I used debug mode on another machine (which don't have nan loss) it also yields the same.

Bartzi commented 6 years ago

Seems the shapes produced by the network are not as they should be. Are you using your own data?

qnkhuat commented 6 years ago

Yes. I've created my own data. I trained it on another machine and it doesn't get the nan. But it stuck at 99.96% for a day :D

Bartzi commented 6 years ago

Then you should check the number of classes your dataset has. Did you adjust the network, to fit to your number of classes?

How large is your validation set?

qnkhuat commented 6 years ago

I need to detect 1 text with 17 chars.

17 1 $PATH 1GCHTCFE4C8101563 Example of my gt

My validation set is 120mb(3700 images). Is it too big?

Bartzi commented 6 years ago

How many different characters do you want to recognize?

3700 images is not to much for validation. Actually it should work... I'm not sure why it doesn't. You can, however, just uncomment the epoch evaluator from the training script and then this should not be a problem anymore.

qnkhuat commented 6 years ago

yea. But it still receives nan :(.