incompatible array types are mixed in the forward input (Convolution2DFunction).

wdon021 commented 3 years ago

Hi Christian,

I have encountered this error at the end of epoch training (99.75%), I am using google colab to do the training.

TypeError Traceback (most recent call last)
in () 323 # ) 324 --> 325 trainer.run() 12 frames /usr/local/lib/python3.6/dist-packages/chainer/training/trainer.py in run(self, show_loop_exception_msg) 374 f.write('Traceback (most recent call last):\n') 375 traceback.print_tb(sys.exc_info()[2]) --> 376 six.reraise(*exc_info) 377 finally: 378 for _, entry in extensions: /usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb) 701 if value.__traceback__ is not tb: 702 raise value.with_traceback(tb) --> 703 raise value 704 finally: 705 value = None /usr/local/lib/python3.6/dist-packages/chainer/training/trainer.py in run(self, show_loop_exception_msg) 344 for name, entry in extensions: 345 if entry.trigger(self): --> 346 entry.extension(self) 347 except Exception as e: 348 if show_loop_exception_msg: /usr/local/lib/python3.6/dist-packages/chainer/training/extensions/evaluator.py in __call__(self, trainer) 178 with reporter: 179 with configuration.using_config('train', False): --> 180 result = self.evaluate() 181 182 reporter_module.report(result) /usr/local/lib/python3.6/dist-packages/chainer/training/extensions/evaluator.py in evaluate(self) 239 with function.no_backprop_mode(): 240 if isinstance(in_arrays, tuple): --> 241 eval_func(*in_arrays) 242 elif isinstance(in_arrays, dict): 243 eval_func(**in_arrays) /content/drive/My Drive/Colab Notebooks/COMP421/Project/multi_accuracy_classifier.ipynb in __call__(self, *args) /content/drive/My Drive/Colab Notebooks/COMP421/Project/fsns.ipynb in __call__(self, images, label) /content/drive/My Drive/Colab Notebooks/COMP421/Project/fsns.ipynb in __call__(self, images) /usr/local/lib/python3.6/dist-packages/chainer/link.py in __call__(self, *args, **kwargs) 285 # forward is implemented in the child classes 286 forward = self.forward # type: ignore --> 287 out = forward(*args, **kwargs) 288 289 # Call forward_postprocess hook /usr/local/lib/python3.6/dist-packages/chainer/links/connection/convolution_2d.py in forward(self, x) 249 return convolution_2d.convolution_2d( 250 x, self.W, self.b, self.stride, self.pad, dilate=self.dilate, --> 251 groups=self.groups, cudnn_fast=self.cudnn_fast) 252 253 /usr/local/lib/python3.6/dist-packages/chainer/functions/connection/convolution_2d.py in convolution_2d(x, W, b, stride, pad, cover_all, **kwargs) 656 else: 657 args = x, W, b --> 658 y, = fnode.apply(args) 659 return y /usr/local/lib/python3.6/dist-packages/chainer/function_node.py in apply(self, inputs) 267 is_chainerx, in_data = _extract_apply_in_data(inputs) 268 --> 269 utils._check_arrays_forward_compatible(in_data, self.label) 270 271 if is_chainerx: /usr/local/lib/python3.6/dist-packages/chainer/utils/__init__.py in _check_arrays_forward_compatible(arrays, label) 91 'Actual: {}'.format( 92 ' ({})'.format(label) if label is not None else '', ---> 93 ', '.join(str(type(a)) for a in arrays))) 94 95 TypeError: incompatible array types are mixed in the forward input (Convolution2DFunction). Actual: , ,

Some online resources suggested adding to_gpu() at the end of every Convolution2DFunction, but it didn't work as well. Can you please help me with it? Thank you.

Bartzi commented 3 years ago

Hi,

which version of Chainer are you using? It looks like your version is too new! If you make sure that you are using the version specified in the file requirements.txt you should not have any problems with some arrays still residing on the CPU while others are already on the GPU. If you can not use the old Version you'll have to adapt the code to work with the newer way of specifying where arrays should be, as required in newer versions of Chainer.

If you do so, I would be happy to review a PR :wink:

wdon021 commented 3 years ago

Thank you @Bartzi for your reply, really appreciated.

I ran into another error once I change the chainer !pip install chainer==3.2.0 .

RuntimeError Traceback (most recent call last)
in () 200 # use the MultiProcessParallelUpdater in order to harness the full power of data parallel computation 201 # updater = MultiprocessParallelUpdater(train_iterators, optimizer, devices=gpus) --> 202 updater = StandardUpdater(train_iterators, optimizer, device=0) 203 log_dir = os.path.join(log_dir, "{}_{}".format(datetime.datetime.now().isoformat(), log_name)) 204 log_dir = log_dir 3 frames /usr/local/lib/python3.6/dist-packages/chainer/training/updater.py in __init__(self, iterator, optimizer, converter, device, loss_func) 144 if device is not None and device >= 0: 145 for optimizer in six.itervalues(self._optimizers): --> 146 optimizer.target.to_gpu(device) 147 148 self.converter = converter /usr/local/lib/python3.6/dist-packages/chainer/link.py in to_gpu(self, device) 727 728 def to_gpu(self, device=None): --> 729 with cuda._get_device(device): 730 super(Chain, self).to_gpu() 731 d = self.__dict__ /usr/local/lib/python3.6/dist-packages/chainer/cuda.py in _get_device(*args) 217 for arg in args: 218 if type(arg) in _integer_types: --> 219 check_cuda_available() 220 return Device(arg) 221 if isinstance(arg, ndarray): /usr/local/lib/python3.6/dist-packages/chainer/cuda.py in check_cuda_available() 78 '(see https://github.com/chainer/chainer#installation).') 79 msg += str(_resolution_error) ---> 80 raise RuntimeError(msg) 81 if (not cudnn_enabled and 82 not _cudnn_disabled_by_user and RuntimeError: CUDA environment is not correctly set up (see https://github.com/chainer/chainer#installation).cannot import name 'sqrt_fixed'

I tried to re-install the CUDA to 8.0

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61

But the error remained. Thank you again for helping me out here.

Bartzi commented 3 years ago

Oh, I forgot to add:

You should also install cupy for it to work. I suggest you install it with !pip install cupy==3.2.0 Chainer as such only delivers the computing logic, but everything that has to do with CUDA is handled by cupy, which is a drop-in replacement for numpy. Please, also check the chainer documentation about using GPUs.

wdon021 commented 3 years ago

Hi @Bartzi Thank you for the reply. I have tried !pip install cupy==2.2.0 as well. I am currently going through every input and print out its type to see if I can locate the problem. thanks for your time.

wdon021 commented 3 years ago

from what I observe, it happened at the beginning of the second epoch, at the start of localization network, in the first convolucation layer.

Looks like the <class 'numpy.ndarray'> is carried over from somewhere or from epoch 1.

wdon021 commented 3 years ago

fixed by adding '.to_device(0)' to the model ( which is the Classifier), my assumption is, it somehow didn't put the parameters (weight) into GPU at the end of the epoch, then trying to push the parameters to the next epoch's convolution process.

epoch_evaluator = ( chainer.training.extensions.Evaluator( epoch_validation_iterator, model.to_device(0), device=updater.device, converter=concat_and_pad_examples, ), (1, 'epoch') )

Now I have a second problem, which is the model is running after 1st epoch.

But I don't get to see the following information, do you know what went wrong?

 total [##................................................]  4.50%
this epoch [#############################################.....] 90.07% 301 iter, 0 epoch / 20 epochs 0.42648 iters/sec. Estimated time to finish: 4:09:25.048276.

Thank you (I am slowly pick up chainer, I use Keras most of the time before) ( I am slowly adapting this to the newer version of Chainer, Cupy, and CUDA, as I didn't downgrade any of the packages)

wdon021 commented 3 years ago

the progress bar needs to be explicitly called by add progress_bar=True, to the epoch_evaluator

But now I got this validation process keep re-iterating itself, do you know how to turn this off?

validation [########################################################################################################################################################################################################################################################################] 528.85% 506 / 1754 iterations -10.788 iters/sec. Estimated time to finish: -1 day, 23:54:27.815703. starts classifier ====================== Localization got to the start point=================== Localization got to first conv=================== Localization got to the end point=================== fsns resnet got to the end point=================== into the loss================================= into the accuracy================================= classifier got to the end point=================== validation [########################################################################################################################################################################################################################################################################] 529.53% 518 / 1754 iterations -10.779 iters/sec. Estimated time to finish: -1 day, 23:54:26.993988. starts classifier ====================== Localization got to the start point=================== Localization got to first conv=================== Localization got to the end point=================== fsns resnet got to the end point=================== into the loss================================= into the accuracy================================= classifier got to the end point=================== validation [#########################################################################################################################################################################################################################################################################] 530.22% 530 / 1754 iterations

thanks,

wdon021 commented 3 years ago

I see, change (1, 'epoch') to (10, 'epoch'), then it will stop doing validation every epoch, is my understanding correct?

Bartzi commented 3 years ago

Hmm, the validation iter does not stop. This is most probably because the value of repeat is not set to False (see for example this line).

It should work if you set repeat=False when building the validation iterator.

wdon021 commented 3 years ago

@Bartzi I see, thank you for your time, really appreciated!.

Bartzi / see

incompatible array types are mixed in the forward input (Convolution2DFunction). #101