Bug in unet.plain architecture

jumutc commented 3 years ago

Hello,

My pipeline looks as follows (miscnn==1.1.2):

from miscnn.neural_network.architecture.unet.plain import Architecture
...
pp = Preprocessor(data_io, batch_size=5, subfunctions=sf,
                  prepare_subfunctions=True, prepare_batches=False,
                  data_aug=aug, analysis="fullimage")
model = Neural_Network(preprocessor=pp, loss=tversky_crossentropy,
                       metrics=[tversky_loss, dice_soft, dice_crossentropy],
                       batch_queue_size=10, workers=1, learninig_rate=2e-4, 
                       architecture=Architecture())

and after starting the training I get:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-11-1e552686cec3> in <module>
     14 cb_tb = TensorBoard(log_dir="tensorboard", histogram_freq=0, write_graph=True, write_images=True)
     15 
---> 16 model.train(sample_list[2:], epochs=100, iterations=500, callbacks=[cb_lr, cb_es, cb_tb])

~\anaconda3\envs\bacteria_cfu\lib\site-packages\miscnn\neural_network\model.py in train(self, sample_list, epochs, iterations, callbacks)
    131                        callbacks=callbacks,
    132                        workers=self.workers,
--> 133                        max_queue_size=self.batch_queue_size)
    134         # Clean up temporary files if necessary
    135         if self.preprocessor.prepare_batches or self.preprocessor.prepare_subfunctions:

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs)
    106   def _method_wrapper(self, *args, **kwargs):
    107     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
--> 108       return method(self, *args, **kwargs)
    109 
    110     # Running inside `run_distribute_coordinator` already.

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1096                 batch_size=batch_size):
   1097               callbacks.on_train_batch_begin(step)
-> 1098               tmp_logs = train_function(iterator)
   1099               if data_handler.should_sync:
   1100                 context.async_wait()

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
    838         # Lifting succeeded, so variables are initialized and we can run the
    839         # stateless function.
--> 840         return self._stateless_fn(*args, **kwds)
    841     else:
    842       canon_args, canon_kwds = \

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
   2827     with self._lock:
   2828       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2829     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   2830 
   2831   @property

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\eager\function.py in _filtered_call(self, args, kwargs, cancellation_manager)
   1846                            resource_variable_ops.BaseResourceVariable))],
   1847         captured_inputs=self.captured_inputs,
-> 1848         cancellation_manager=cancellation_manager)
   1849 
   1850   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1922       # No tape is watching; skip to running the function.
   1923       return self._build_call_outputs(self._inference_function.call(
-> 1924           ctx, args, cancellation_manager=cancellation_manager))
   1925     forward_backward = self._select_forward_and_backward_functions(
   1926         args,

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
    548               inputs=args,
    549               attrs=attrs,
--> 550               ctx=ctx)
    551         else:
    552           outputs = execute.execute_with_cancellation(

~\anaconda3\envs\bacteria_cfu\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError:  ConcatOp : Dimensions of inputs should match: shape[0] = [5,320,36,36] vs. shape[1] = [5,320,37,37]
     [[node functional_3/concatenate_5/concat (defined at C:\Users\Students\anaconda3\envs\bacteria_cfu\lib\site-packages\miscnn\neural_network\model.py:133) ]] [Op:__inference_train_function_16191]

Function call stack:
train_function

muellerdo commented 3 years ago

Hey @jumutc,

it's no bug in the U-Net architecture. This error results due to the wrong image/patch sizes.

The U-Net architecture is build as a structure with multiple levels. https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/u-net-architecture.png

In theory, the analysed image part is reduced via a convolutional layer in each level. By default, the reduction is done by /2. This means that your image/patch shape has to be dividable by 2 for N-times, in which N is the number of levels (depth). MIScnn utilizes by default the standard U-Net of Ronneberger et al. with a depth of 4. In contrast, the plain U-Net is a reimplementation of Isensee et al. U-Net variant, which performed a little bit better than the standard one.

Long story short: You have to adjust your image/patch sizes.

Solutions:

Adjust the patch shape
Using the resize subfunction
Using the resampling subfunction
Using the padding subfunction

Hope that I was able to help you.

Cheers, Dominik

jumutc commented 3 years ago

@muellerdo thanks for answering, my default patch size is 596x596 which worked with other architectures. As far as I get it from the paper the size should be 160x160 or 2x multiples of it but there is no clear indication in Wiki on this. I would suggest to add one-liner in the documentation about each architecture's defaults (layers + patch sizes)

muellerdo commented 3 years ago

Totally agreed. You are right, currently I just added the reference to the papers in the wiki. I added a one line note with the recommended patch shape.

The paper, on which the plain u-net is based, used a patch shape of 80×160×160. http://arxiv.org/abs/1908.02182

Thanks for the feedback!

frankkramer-lab / MIScnn

Bug in unet.plain architecture #51