2017-fall-DL-training-program / ImageCaption

1 stars 0 forks source link

Could we set "finetune_cnn_after" parameter to -1? #13

Closed jessejchuang closed 7 years ago

jessejchuang commented 7 years ago

Hi TA,

I don't know why the training speed is so slow in Azure VM and charge us so much money. Ran 5 epochs in the morning but still got no result in the evening. _The default value of "finetune_cnnafter" in opts.py is 0. Could we set it to -1? This can speed up our homework.

connie980149 commented 7 years ago

I think it's okay to disable the fine tune function. Can you report the time needed for one epoch?

jessejchuang commented 7 years ago

It seems the sample code doesn't summarize one epoch execution time. I can only try to estimate it by the time of one iteration. Check out the logs below. The time 'read data' and 'train' in an iteration is around 1s vs. 0.5s. So for one epoch( 113287 images, then 11329 iterations), that's 3h9m vs. 1h34m.

('Read data:', 0.13839316368103027) iter 5997 (epoch 0), train_loss = 3.107, time/batch = 0.780 ('Read data:', 0.14182162284851074) iter 5998 (epoch 0), train_loss = 2.830, time/batch = 0.788 ('Read data:', 0.2410578727722168) iter 5999 (epoch 0), train_loss = 2.534, time/batch = 0.801 image 184613: a group of people standing on a field with a horse image 403013: a kitchen with a stove and a refrigerator

('Read data:', 0.012835979461669922) iter 5997 (epoch 0), train_loss = 3.570, time/batch = 0.453 ('Read data:', 0.01192784309387207) iter 5998 (epoch 0), train_loss = 3.024, time/batch = 0.448 ('Read data:', 0.011527299880981445) iter 5999 (epoch 0), train_loss = 2.869, time/batch = 0.446 image 184613: a group of people riding horses on a horse image 403013: a kitchen with a kitchen counter top and a sink

jessejchuang commented 7 years ago

Forget it. The loss seems not to converge without cnn fine tuning.