Error when starting to train

darvida commented 4 years ago

Hello, i get the following error when im trying to start the training: (unet-) C:\hypocotyl-UNet-master\src>python train.py --train_dataset=C:\Users\David\Desktop\training_222\converted - -trained_model_path=C:\Users\David\Desktop\model\model Traceback (most recent call last): File "train.py", line 67, in <module> verbose=False, save_freq=args.model_save_freq) File "C:\hypocotyl-UNet-master\src\unet\utils.py", line 294, in train_model training_loss = self.loss(y_out, y_batch) File "C:\Users\David\Anaconda3\envs\unet-\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "C:\hypocotyl-UNet-master\src\unet\utils.py", line 147, in forward y_gt = torch.zeros_like(y_pred).scatter_(1, y_gt[:, None, :], 1) RuntimeError: mismatch in length of strides and shape Do you know why this might be so?

cosmic-cortex commented 4 years ago

Hi! My guess would be that you have used a mask with more than 3 classes. (Usual is background, hypocotyl, non-hypocotyl.)

Did you manage to solve it? Let me know if this is still a problem! I might not answer immediately, but I'll try to help.

darvida commented 4 years ago

I checked it and it was something wrong with the bitmap. Now when i'm starting the training i get the following error instead:

C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:208: block: [881,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "train.py", line 70, in verbose=False, save_freq=args.model_save_freq) File "C:\hypocotyl-UNet-master\src\unet\utils.py", line 298, in train_model epoch_running_loss += training_loss.item() RuntimeError: CUDA error: device-side assert triggered

Is it still the masks that is the problem ?

darvida commented 4 years ago

I got the training to start with the "converted" images but not with the "patched_images", but after one epoch i get the following error:

(unet-) C:\hypocotyl-UNet-master\src>python train.py --train_dataset=C:\Users\David\Desktop\bitmap-projekt\train_batchx2\converted --val_dataset=C:\Users\David\Desktop\bitmap-projekt\val_batchx2\converted --trained_model_path=C:\Users\David\Desktop\model\model --device=cuda:0

(Epoch no. 0) loss: 0.379686 Validation loss: 0.345278 Validation loss improved from inf to 0.345278, model saved to C:\hypocotyl-UNet-master\src..\checkpoints\UNet-hypocotyl C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:208: block: [886,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "train.py", line 70, in verbose=False, save_freq=args.model_save_freq) File "C:\hypocotyl-UNet-master\src\unet\utils.py", line 298, in train_model epoch_running_loss += training_loss.item() RuntimeError: CUDA error: device-side assert triggered

biomag-lab / hypocotyl-UNet

Error when starting to train #9