TuSimple / TuSimple-DUC

Understanding Convolution for Semantic Segmentation
https://arxiv.org/abs/1702.08502
Apache License 2.0
610 stars 118 forks source link

Any config instruction for training on VOC2012 dataset? #14

Closed zhijiew closed 6 years ago

zhijiew commented 6 years ago

Hi! I want retrain this model on VOC2012 dataset, is there any config file or advices? Thanks a lot!

GrassSunFlower commented 6 years ago

You can get it done by making very small modifications of the default config file. Things like the label_num to fit the number of classes in VOC, and proper crop_shape to fit the image size of VOC dataset. Or you can check more details from Understanding Convolution for Semantic Segmentation

zhijiew commented 6 years ago

Thank you for your reply, but I still can not figure out how to train this model on VOC2012, could you give a example config file and tell me the data processing method?beside the config file, do I need modify the code for training on voc?

GrassSunFlower commented 6 years ago

Just as I mentioned, label_num and crop_shape are basically the most important staff you need to change for adjusting VOC. We also have modified the cell_width to 4 as I remembered to get coarser result due to the limit of resolutions for VOC data. I'm afraid I can't provide you with an example config, since the experiments we made on VOC was like one year ago. And we didn't do any maintaining work after achieving SOTA. Besides, this repo is particularly designed for Cityscapes dataset, so you definitely need to change the network definition and data pre-processing part for VOC. I can give you some advice on which parts may be modified to fit VOC if you want, but currently we don't have any intention to add support for VOC.

Timo-hab commented 6 years ago

Hi! I want to finetune the pretrained cityscapes weights on a different dataset with 20 classes. For this reason, I changed the "label_num" in train_cityscapes.cfg to 20, but I get the following error:

Traceback (most recent call last):
  File "train_model.py", line 15, in <module>
    train_end2end()
  File "train_model.py", line 12, in train_end2end
    model.fit()
  File "/home/kronach/timo/TuSimple-DUC/train/solver.py", line 229, in fit
    num_epoch=self.num_epochs,
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/base_module.py", line 496, in fit
    self.update_metric(eval_metric, data_batch.label)
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/module.py", line 735, in update_metric
    self._exec_group.update_metric(eval_metric, labels)
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/executor_group.py", line 582, in update_metric
    eval_metric.update_dict(labels_, preds)
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/metric.py", line 108, in update_dict
    self.update(label, pred)
  File "/home/kronach/timo/TuSimple-DUC/tusimple_duc/core/metrics.py", line 29, in update
    metric.update(labels, preds)
  File "/home/kronach/timo/TuSimple-DUC/tusimple_duc/core/metrics.py", line 140, in update
    soft_label[b][c][label[b] == c] = 1.0
IndexError: index 19 is out of bounds for axis 0 with size 19

When I train it from the scratch, the trainings starts without errors. So I think the problem is the CitysScapes-symbol.json file? The Reshape operator at the very end has the "shape": "(0,19,-1)". So I changed the 19 to 20, but then I get:

Traceback (most recent call last):
  File "train_model.py", line 15, in <module>
    train_end2end()
  File "train_model.py", line 12, in train_end2end
    model.fit()
  File "/home/kronach/timo/TuSimple-DUC/train/solver.py", line 229, in fit
    num_epoch=self.num_epochs,
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/base_module.py", line 460, in fit
    for_training=True, force_rebind=force_rebind)
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/module.py", line 417, in bind
    state_names=self._state_names)
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/executor_group.py", line 231, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/executor_group.py", line 327, in bind_exec
    shared_group))
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/module/executor_group.py", line 603, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/home/kronach/timo/TuSimple-DUC/mxnet/python/mxnet/symbol.py", line 1479, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
seg_loss_label: (2, 90000)
data: (2, 3, 600, 600)
Error in operator seg_loss: Expecting (2,85500) or (2,85500). But got (2,90000)

Could you please give me a hint, what I still need to change? How can I get the CityScapes-symbol.json file with 20 classes instead of 19? I am really looking forward to your answer.

Thanks, Timo

Timo-hab commented 6 years ago

GrassSunFlower answered me the following via email:

That's because the parameter file you use is still for 19 labels. 
If you want to simply change the last layer, you need to change the source code and load 
parameter of every layer except the last one and do initialization for the last layer.

Thanks for the hint!

I had to change the filter size of the four fc1 layers, so that the Reshape layer can handle the "shape": "(0,20,-1)". Furthermore I followed the instruction of https://mxnet.incubator.apache.org/how_to/finetune.html, to remove the parameters of the fc1 layers to get rid of the shape mismatch error. Since the fc1 layers are trained from the scratch, i increased the learning rate for this layers ("attr": {"__lr_mult__": "10"}).

In Caffe it is just necessary to change the number of classes in the network definition. The change in the parameter file is not necessary. Much simpler.