eragonruan / text-detection-ctpn

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
MIT License
3.43k stars 1.33k forks source link

about "loss" when training the model #94

Open FantDing opened 6 years ago

FantDing commented 6 years ago

Hi! When I train the model, because of the sufficient graphic memory, I change the model like this

(self.feed('data')
         .conv(3, 3, 64, 1, 1, name='conv1_1')
         .max_pool(2, 2, 2, 2, padding='VALID', name="conv1_1")
         .conv(3, 3, 64, 1, 1, name='conv1_2')
         .max_pool(2, 2, 2, 2, padding='VALID', name='pool1')
         .conv(3, 3, 128, 1, 1, name='conv2_1')
         .conv(3, 3, 128, 1, 1, name='conv2_2')
         .max_pool(2, 2, 2, 2, padding='VALID', name='pool2')
         .conv(3, 3, 256, 1, 1, name='conv3_1')
         # .conv(3, 3, 256, 1, 1, name='conv3_2')
         # .conv(3, 3, 256, 1, 1, name='conv3_3')
         .max_pool(2, 2, 2, 2, padding='VALID', name='pool3')
         .conv(3, 3, 512, 1, 1, name='conv4_1')
         # .conv(3, 3, 512, 1, 1, name='conv4_2')
         # .conv(3, 3, 512, 1, 1, name='conv4_3')
         .max_pool(2, 2, 2, 2, padding='VALID', name='pool4')
         .conv(3, 3, 512, 1, 1, name='conv5_1')
         .conv(3, 3, 512, 1, 1, name='conv5_2')
         .conv(3, 3, 512, 1, 1, name='conv5_3'))

However, the loss shaked violently

image

image

how to solve that? Thank you

cipri-tom commented 6 years ago

Maybe try decreasing the learning rate ? Or using a different optimiser

FantDing commented 6 years ago

@cipri-tom Thanks, I will have a try

FantDing commented 6 years ago

@cipri-tom What's the final value of total loss approximately

cipri-tom commented 6 years ago

Mine goes down 0.08 in the final iterations.

But since I posted the comment I also trained on some dataset where the loss had spikes like yours. I couldn't identify the problem, but the training completed fine

eragonruan commented 6 years ago

@FantDing recheck the data, there may be some annotation error

FantDing commented 6 years ago

@eragonruan Thank you! The high loss may caused by the model, which I remove lots of layers from. Now, I fixed VGG16 layers' parameter. It takes about 0.5s per iter to train, which is more faster than before. However, the total loss is about 0.2 which is not very ideal.

yxandam commented 6 years ago

@FantDing i meet the same question ,can you tell me how to fixe VGG16 layers' parameter? thanks