LeeWise9 / Image_Captioning

看图说话,基于keras,支持GPU。Image captioning code in keras, runs on GPU.
23 stars 2 forks source link

Confuse about the Epoch in Training #1

Open phyukhaing7 opened 5 years ago

phyukhaing7 commented 5 years ago

When I run these codes on my computer, running is all okay. But I confused about the epoch. After training, I got 20 model files. When I test with that model files, I got the best performance on 'model_0.h5'. My previous understanding is that the best accuracy will get on the process of many epochs. The performance of model_0.h5 are : BLEU-1: 0.555611, BLEU-2: 0.297323, BLEU-3: 0.166134, BLEU-4: 0.086610. The performance of model_10.h5 are : BLEU-1: 0.546234, BLEU-2: 0.288881, BLEU-3: 0.158739, BLEU-4: 0.079482. The performance of model_19.h5 are : BLEU-1: 0.520667, BLEU-2: 0.271692, BLEU-3: 0.148475, BLEU-4: 0.076073.

Please explain to my confusion!

Best Regards,

LeeWise9 commented 5 years ago

BLEU method is one of the methods to evaluate the performance of the model. The training process can ensure that the loss value decreases steadily, but the BLEU value is not certain. I think more attention should be paid to the changes in the loss values on the test data set to confirm whether over-fitting has occurred.

If both the val_loss value and BLEU value increase, this should be over-fitting. You need to take some measures to prevent over-fitting.

If the val_loss value decreases and the BLEU value increases, this situation can not be simply called over-fitting. This only proves that they are not monotonous correlations.

The more likely explanation is that the neural network is too simple, and the model is not fit well, the loss value is still too big. You can try to deal with this problem with larger data sets and more complex neural networks.

phyukhaing7 commented 5 years ago

Thanks a lot for your reply.........

In my experimentation, the val_loss value is decreased and BLEU value is also decreased. I think I need to learn more.

Best Regards,

LeeWise9 commented 5 years ago

In order to test my ideas and dispel your confusion, I have made some progress. I changed the network structure: after the merge layer, two full connection layers were added, and the model got better fitting ability. The results of the validation became normal: Model _0: BLEU-1: 0.479384 BLEU-2: 0.217194 BLEU-3: 0.125174 BLEU-4: 0.045315 Model 4: BLEU-1: 0.514285 BLEU-2: 0.252239 BLEU-3: 0.149873 BLEU-4: 0.062614 Model 9: BLEU-1: 0.514285 BLEU-2: 0.252239 BLEU-3: 0.149873 BLEU-4: 0.062614 This at least proves that the performance of the new model improves (or equals) with the increase of training times. And the model performance will also fall into the platform period. I hope these results will help you. Best Regard :)

phyukhaing7 commented 5 years ago

Thanks a lot, Sir for your kindly reply. Can I get the code that you changed the network structure?

Best Regards,

LeeWise9 commented 5 years ago

Here is the part of the function that needs to be changed. To test the performance of different models, I changed the structure several times, so I'm not sure that you can get the same results as I did with this code.

define the captioning model

def define_model(vocab_size, max_length):

feature extractor model

inputs1 = Input(shape=(4096,))
fe1 = Dropout(0.5)(inputs1)
fe2 = Dense(256, activation='relu')(fe1)
fe3 = Dropout(0.5)(fe2)
# sequence model
inputs2 = Input(shape=(max_length,))
se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
se2 = Dropout(0.5)(se1)
se3 = LSTM(256)(se2)
# decoder model
decoder1 = add([fe3, se3])
decoder2 = Dense(512, activation='relu')(decoder1)
decoder2 = Dropout(0.5)(decoder2)
decoder4 = Dense(256, activation='relu')(decoder2)
decoder4 = Dropout(0.5)(decoder4)
outputs = Dense(vocab_size, activation='softmax')(decoder4)
# tie it together [image, seq] [word]
model = Model(inputs=[inputs1, inputs2], outputs=outputs)
# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam')
# summarize model
model.summary()
#plot_model(model, to_file='model.png', show_shapes=True)
return model

The purpose of this project is to realize the function of describing picture content. It is more like a tutorial, and may not achieve particularly good results. Better yet, add attention mechanisms.

If you want to pursue higher Bleu values, you can refer to projects with more stars.

Best Regard :)

phyukhaing7 commented 5 years ago

Thanks a lot, Sir. Now, I am learning Image Captioning. I would like to contact you with the mail. Can I contact the mail? If available, my mail is phyukhaing7@gmail.com.

Best Regards,

LeeWise9 commented 5 years ago

I'm afraid I can't give you much help because my major is not very relevant to machine learning. It's just a hobby for me to do this.

If you don't mind, I think it's OK to communicate here.

Best Regards.

phyukhaing7 commented 5 years ago

Yes Sir, May I know if you have any continuous learning in Image Captioning.

Best Regards,

LeeWise9 commented 5 years ago

Occasionally read some pages, run the code on them, or spend some time reading papers, that's all.

Best Regards.

phyukhaing7 commented 5 years ago

Thanks a lot for your suggestions.

Best Regards,

LeeWise9 commented 5 years ago

You're welcome. I hope you can make a success in Image Captioning.

Best Regards :)

1228589545 commented 4 years ago

where is this features.pkl ?

LeeWise9 commented 4 years ago

@1228589545 features.pkl 这个文件是运行完 step1之后生成的,要获得这个文件需要你自己运行的

1228589545 commented 4 years ago

thank you.

------------------ 原始邮件 ------------------ 发件人: "Leo"<notifications@github.com>; 发送时间: 2019年12月31日(星期二) 中午12:57 收件人: "LeeWise9/Image_Captioning"<Image_Captioning@noreply.github.com>; 抄送: "往事随峰"<1228589545@qq.com>; "Mention"<mention@noreply.github.com>; 主题: Re: [LeeWise9/Image_Captioning] Confuse about the Epoch in Training (#1)

@1228589545 features.pkl 这个文件是运行完 step1之后生成的,要获得这个文件需要你自己运行的

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

LeeWise9 commented 4 years ago

不客气

Blue-Eagle-10 commented 4 years ago

Is there any reference for this project? Thank you!