broadinstitute / keras-rcnn

Keras package for region-based convolutional neural networks (RCNNs)
Other
553 stars 224 forks source link

RPN: InvalidArgumentError: Reduction axis 0 is empty in shape [0,2] #169

Closed tomihisaw closed 6 years ago

tomihisaw commented 6 years ago

Hi, I am able to use the R-CNN network without issue, however I run into problems when trying to train the RPN network.

I am using example code from the comments in _rpn.y and python 3.6, keras 2.1.3, tensorflow 1.5.0

training, validation, test = keras_rcnn.datasets.malaria.load_data()
classes = {"rbc": 1, "not":2}
generator = keras_rcnn.preprocessing.ObjectDetectionGenerator()
generator = generator.flow(training, classes, (448, 448), 1.0)

image = keras.layers.Input((448, 448, 3))
model_rpn = keras_rcnn.models.RPN(image, classes=len(classes) + 1)
optimizer = keras.optimizers.Adam(0.0001)
model_rpn.compile(optimizer)

model_rpn.fit_generator(epochs=10, generator=generator, steps_per_epoch=1000)

The error occurs in _anchor_target.py here: gt_argmax_overlaps_inds = keras.backend.argmax(reference, axis=0)

It seems the issue is with the axis=0 since if I change it to to 1, it won't at least produce the error. Beyond that I am not sure what is going on.

I also get some warnings when building the model (not sure if it is related)

_rpn.py:207: UserWarning: Output "concatenate_26" missing from loss dictionary. We assume this was done on purpose, and we will not be expecting any data to be passed to "concatenate_26" during training.
  super(RPN, self).compile(optimizer, None)
_rpn.py:207: UserWarning: Output "proposal_target_6" missing from loss dictionary. We assume this was done on purpose, and we will not be expecting any data to be passed to "proposal_target_6" during training.
  super(RPN, self).compile(optimizer, None)
_rpn.py:207: UserWarning: Output "rpn_6" missing from loss dictionary. We assume this was done on purpose, and we will not be expecting any data to be passed to "rpn_6" during training.
  super(RPN, self).compile(optimizer, None)

Below is the full error I see during training..

And thanks for any help or insight..

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1349     try:
-> 1350       return fn(*args)
   1351     except errors.OpError as e:

~/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1328                                    feed_dict, fetch_list, target_list,
-> 1329                                    status, run_metadata)
   1330 

~/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: Reduction axis 0 is empty in shape [0,2]
     [[Node: anchor_target_5/ArgMax = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](anchor_target_5/truediv_1, rpn_1/ExpandDims/dim)]]
     [[Node: anchor_target_5/cond/scatter_add_tensor/assert_equal/Assert/AssertGuard/Assert/Switch/_3151 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9808_anchor_target_5/cond/scatter_add_tensor/assert_equal/Assert/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-5-59aa7ebc6c25> in <module>()
----> 1 model_rpn.fit_generator(epochs=10, generator=generator, steps_per_epoch=1000)

~/.virtualenvs/py36/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name +
     90                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~/.virtualenvs/py36/lib/python3.6/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2175                     outs = self.train_on_batch(x, y,
   2176                                                sample_weight=sample_weight,
-> 2177                                                class_weight=class_weight)
   2178 
   2179                     if not isinstance(outs, list):

~/.virtualenvs/py36/lib/python3.6/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1847             ins = x + y + sample_weights
   1848         self._make_train_function()
-> 1849         outputs = self.train_function(ins)
   1850         if len(outputs) == 1:
   1851             return outputs[0]

~/.virtualenvs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2473         session = get_session()
   2474         updated = session.run(fetches=fetches, feed_dict=feed_dict,
-> 2475                               **self.session_kwargs)
   2476         return updated[:len(self.outputs)]
   2477 

~/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1126     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1127       results = self._do_run(handle, final_targets, final_fetches,
-> 1128                              feed_dict_tensor, options, run_metadata)
   1129     else:
   1130       results = []

~/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1342     if handle is None:
   1343       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344                            options, run_metadata)
   1345     else:
   1346       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1361         except KeyError:
   1362           pass
-> 1363       raise type(e)(node_def, op, message)
   1364 
   1365   def _extend_graph(self):

InvalidArgumentError: Reduction axis 0 is empty in shape [0,2]
     [[Node: anchor_target_5/ArgMax = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](anchor_target_5/truediv_1, rpn_1/ExpandDims/dim)]]
     [[Node: anchor_target_5/cond/scatter_add_tensor/assert_equal/Assert/AssertGuard/Assert/Switch/_3151 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9808_anchor_target_5/cond/scatter_add_tensor/assert_equal/Assert/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'anchor_target_5/ArgMax', defined at:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 478, in start
    self.io_loop.start()
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-45bb00c2465e>", line 3, in <module>
    model_rpn = keras_rcnn.models.RPN(image, classes=len(classes) + 1)
  File "/home/tom/src/keras-rcnn/keras_rcnn/models/_rpn.py", line 176, in __init__
    target_anchors, target_scores, target_bounding_boxes = keras_rcnn.layers.AnchorTarget(base_size=(feature_map - 1), scales=[1])([scores, bounding_boxes, metadata])
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/keras/engine/topology.py", line 617, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/tom/src/keras-rcnn/keras_rcnn/layers/object_detection/_anchor_target.py", line 85, in call
    self.clobber_positives)
  File "/home/tom/src/keras-rcnn/keras_rcnn/layers/object_detection/_anchor_target.py", line 164, in label
    y_pred, y_true, inds_inside)
  File "/home/tom/src/keras-rcnn/keras_rcnn/layers/object_detection/_anchor_target.py", line 213, in overlapping
    gt_argmax_overlaps_inds = keras.backend.argmax(reference, axis=0)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 1407, in argmax
    return tf.argmax(x, axis)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 208, in argmax
    return gen_math_ops.arg_max(input, axis, name=name, output_type=output_type)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 472, in arg_max
    name=name)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/home/tom/.virtualenvs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Reduction axis 0 is empty in shape [0,2]
     [[Node: anchor_target_5/ArgMax = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](anchor_target_5/truediv_1, rpn_1/ExpandDims/dim)]]
     [[Node: anchor_target_5/cond/scatter_add_tensor/assert_equal/Assert/AssertGuard/Assert/Switch/_3151 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9808_anchor_target_5/cond/scatter_add_tensor/assert_equal/Assert/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
tomihisaw commented 6 years ago

I figured out the problem. It has to do with the "feature_maps" used by default in the model. The default sizes are larger than the image sizes in the example code. So if you remove the largest size from the features and pass it as an argument, it will work with the example:

model = keras_rcnn.models.RPN(image, classes=len(classes) + 1, feature_maps = [32, 64, 128, 256])