Closed Viswa14 closed 8 years ago
Thank you for helping me out @Zhaw This helps to train the model successfully. But on testing using image_segmentaion.py by change appropriate model_prefix and epoch parameters the result I obtain is a black image. Can you provide an insight why that happens ? I am not sure Why all 0's are returned ? Is there any change in test code for other models ? This test code produces correct result for pre-trained FCN8s model provided by the author.
VALUES i get for data.shape, label.shape, out.shape and out_image are (1L, 3L, 335L, 500L) (1, 167500L) (1L, 21L, 335L, 500L) [[0 0 0 ..., 0 0 0] [0 0 0 ..., 0 0 0] [0 0 0 ..., 0 0 0] ..., [0 0 0 ..., 0 0 0] [0 0 0 ..., 0 0 0] [0 0 0 ..., 0 0 0]]
@tornadomeet @tqchen : Kindly provide suggestions on this.
What's your training accuracy? If your training accuracy is not low then I have no idea what could cause this problem. If your training accuracy is low and stays the same, this is probably because you set the learning rate too high and some parameters become NaN. This will makes your model predict all zero. I don't think you need to change anything in test code if you use your own model.
My training accuracy comes around 69% stays same until 50 Epochs, I do not change anything in either training or testing code. I have the learning rate to be 1e-10, defined by the code in example.
I think you should try higher learning rate. Fcn32s, 16s, 8s model need different learninig rate and 1e-10 is for training fcn8s model. If I remember correctly, learning rate I used for these three model is 1e-4, 1e-7, 1e-10.
Okay. I did my trial based on information provided with the example. Do you suggest to change it according your arguements ? The learning rates provided along with examples are as follows: model lr (fixed) epoch fcn-32s 1e-10 31 fcn-16s 1e-12 27 fcn-8s 1e-14 19
All I can suggest is to raise your learning rate, try different values and see which works. If your training accuracy stays same for a long time, your learning rate is too low. I'm not sure if my arguments will work for you because the original ones didn't either. I think the proper learning rate is related to the input image's size and this may be the reason why you need different learning rate training the same model.
@Viswa14 due to update of mxnet in softmax operator currently, you should use samller lr as @zhaw suggested.
@tornadomeet @zhaw : So you suggest a lower learning rate or higher learning rate ? Zhaw had suggested me to increase learning rate.
And Thank you! Sure, I will try with it and provide an update. It will be great if the document can be modified too as it will support people trying out this example in future.
Sorry, I think I misunderstood "My training accuracy comes around 69% stays same until 50 Epochs". I thought you meant that after 50 epochs your training accuracy increased. If your training accuracy stayed same all the time, that was because your learning rate was too high and some params turned to be NaN. If that was the case, you should lower your learning rate.
yes, i made a mistake just a moment, just use larger lr.
@Viswa14 I obtain is a black image too using the fcn32s model, just like you ,should i lower my learning rate or increase the learning rate?(when i train the fcn32s model i use learning rate = 1e-10)
File "fcn_xs.py", line 57, in main epoch_end_callback = mx.callback.do_checkpoint(fcnxs_model_prefix)) File "solver.py", line 72, in fit aux_states=self.aux_params) File "symbol.py", line 718, in bind args_handle, args = self._get_ndarray_inputs('args', args, listed_arguments, False) File "symbol.py", line 585, in _get_ndarray_inputs raise ValueError('Must specify all the arguments in %s' % arg_key) ValueError: Must specify all the arguments in args
I come across this error when i try to train the model for fcn32s using VGG_FC_ILSVRC_16_layers as Prefix. I believe the trained model provided for VGG16 does not have 'bigscore_bias'. Can anyone help with this regard ?