apple2373 / chainer-caption

image caption generation using chainer!
64 stars 27 forks source link

Loss Function is nan while training after Preprocessed the dataset #7

Open sandy2008 opened 5 years ago

sandy2008 commented 5 years ago

Using the command in personal_notes.txt, and preprocessed the data, I try to train the model but only the first epoch has a loss function of 7.xxx, but then everything is just 0.

apple2373 commented 5 years ago

I don't recommend using the command at personal_notes.txt because it's really a personal note :) . The command in Readme should be better.

In general, the Nan loss means learning rate is too high, so you could try decreasing the learning rate.

sandy2008 commented 5 years ago

Thank you so much for your reply! I tried to train it with the Readme.txt file, but captions ./data/MSCOCO/MSCOCO/mscoco_caption_train2014_processed.json file is not downloaded by the shell command file.

apple2373 commented 5 years ago

Ops! sorry, you are the first one to ask mscoco_caption_train2014_processed.json !

I uploaded here: https://www.dropbox.com/s/1a7cgetjyb8h8ho/mscoco_caption_train2014_processed.json?dl=0

sandy2008 commented 5 years ago

Thank you so much for your reply! Btw, is it the right script to generate dic and preprossed data for Japanese?


python train_caption_model.py --savedir ./experiment1jp_yj --epoch 40 --batch 120 --gpu 0 \
--vocab ./data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--captions ./data/MSCOCO/yjcaptions26k_clean_processed.json \
--preload True
sandy2008 commented 5 years ago

Let me download it and test it again ;)

apple2373 commented 5 years ago

I don't think your command is for generating preprocessed data. The command looks fine to me but it's a training command.

sandy2008 commented 5 years ago

Sorry I copied and pasted the wrong one!!!

python preprocess_MSCOCO_captions.py \
--input ../data/MSCOCO/yjcaptions26k_clean.json \
--output ../data/MSCOCO/yjcaptions26k_clean_processed.json \
--outdic ../data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--outfreq ../data/MSCOCO/yjcaptions26k_clean_processed_freq.json \
--cut 0 \
--char True \
apple2373 commented 5 years ago

yes this is correct.

sandy2008 commented 5 years ago

And the learning rate is nan if I run this command then train the model ;) Do you have any idea why this would happen? I will try to test it with other things..

apple2373 commented 5 years ago

I don't think learning rate can be nan. Well, when i do the training, the loss will decrease around 3 quickly.

Did you modify any of the code? You could clone the repository again and try from scratch. Also remember to use the same chainer version 1.19.0 exactly.

sandy2008 commented 5 years ago

Cool, i will do it later, and there is now another dataset for MSCOCO in Japanese: https://stair.center/archives/338, I will test it with the old dataset and the new dataset ;)