endernewton / tf-faster-rcnn

Tensorflow Faster RCNN for Object Detection
https://arxiv.org/pdf/1702.02138.pdf
MIT License
3.65k stars 1.58k forks source link

Trying to train on custom dataset error loading model #365

Closed Shiro-LK closed 6 years ago

Shiro-LK commented 6 years ago

Hello,

First of all, thank you for sharing the code. It is very helpful. I am trying to launch a training on a custom dataset where there is only one class : raccoon. I convert the label in pascal voc format so I have two folders : one for the train and one for the validation step. For each folder, there are two folders : Annotations with .xml file inside and JPEGImages folder with .jpg images. I am a little stuck because I do not understand what I have to change in pascal_voc.py excepted the class_variable ? What are the different step if I want to train on my dataset ? I modified pascal_voc.py, factory.py and train_faster_rcnn.sh . The problem is I got an error loading the model and I do not know if it is linked to my modification. I think it is because my output as two class (background and raccoon) and the pretrained model 81 (coco dataset) but I do not find in the pascal_voc.py where the number of class can be fixed. Does anyone know how to solve this problem ?

Traceback (most recent call last):
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [81]
     [[Node: save_1/Assign_307 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/cls_score/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/cls_score/biases, save_1/RestoreV2/_111)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./tools/trainval_net.py", line 139, in <module>
    max_iters=args.max_iters)
  File "/home/user1/Documents/DeepLearning/tf-faster-rcnn/tools/../lib/model/train_val.py", line 377, in train_net
    sw.train_model(sess, max_iters)
  File "/home/user1/Documents/DeepLearning/tf-faster-rcnn/tools/../lib/model/train_val.py", line 255, in train_model
    rate, last_snapshot_iter, stepsizes, np_paths, ss_paths = self.initialize(sess)
  File "/home/user1/Documents/DeepLearning/tf-faster-rcnn/tools/../lib/model/train_val.py", line 191, in initialize
    restorer.restore(sess, self.pretrained_model)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [81]
     [[Node: save_1/Assign_307 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/cls_score/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/cls_score/biases, save_1/RestoreV2/_111)]]

Caused by op 'save_1/Assign_307', defined at:
  File "./tools/trainval_net.py", line 139, in <module>
    max_iters=args.max_iters)
  File "/home/user1/Documents/DeepLearning/tf-faster-rcnn/tools/../lib/model/train_val.py", line 377, in train_net
    sw.train_model(sess, max_iters)
  File "/home/user1/Documents/DeepLearning/tf-faster-rcnn/tools/../lib/model/train_val.py", line 255, in train_model
    rate, last_snapshot_iter, stepsizes, np_paths, ss_paths = self.initialize(sess)
  File "/home/user1/Documents/DeepLearning/tf-faster-rcnn/tools/../lib/model/train_val.py", line 190, in initialize
    restorer = tf.train.Saver(variables_to_restore)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in __init__
    self.build()
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 494, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 185, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 283, in assign
    validate_shape=validate_shape)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 60, in assign
    use_locking=use_locking, name=name)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/user1/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [81]
     [[Node: save_1/Assign_307 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/cls_score/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/cls_score/biases, save_1/RestoreV2/_111)]]

Command exited with non-zero status 1
14.30user 1.02system 0:19.96elapsed 76%CPU (0avgtext+0avgdata 1180664maxresident)k
0inputs+6280outputs (0major+276399minor)pagefaults 0swaps
Shiro-LK commented 6 years ago

the problem comes from the restoring weights code. It tried to restore the last layer which have different dimension compared to my model. So I changed a little the code so as to not load the last layer (regression and classif layer)

Hajarat commented 4 years ago

@Shiro-LK care to share your experience with this? I'm trying something similar with mammogram images, I basically have tumour rois for my images. How would I adjust the code to train something like this?