Custom Data Validation - Githubissues

shahabe commented 4 years ago

I am training over my costum data. The labels are in COCO json format. As you can see from the folloaing error, It trians over iterations and evalute them but after a while it crashes compalining about the a tensor size.

Begin training!
[  0]       0 || B: 9.796 | C: 25.278 | M: 6349.665 | S: 14.227 | T: 6398.965 || ETA: 0:00:00 || timer: 5.223
[  2]      10 || B: 12.326 | C: 17.621 | M: 5107.724 | S: 14.384 | T: 5152.054 || ETA: 1:42:27 || timer: 0.568
[  4]      20 || B: 11.336 | C: 15.308 | M: 5463.791 | S: 14.048 | T: 5504.484 || ETA: 1:41:23 || timer: 0.547
[  6]      30 || B: 10.562 | C: 13.626 | M: 4948.197 | S: 13.337 | T: 4985.723 || ETA: 1:40:46 || timer: 0.534
[  8]      40 || B: 10.305 | C: 12.574 | M: 5112.529 | S: 12.393 | T: 5147.801 || ETA: 1:40:09 || timer: 0.507
[ 10]      50 || B: 9.738 | C: 11.845 | M: 4968.056 | S: 10.800 | T: 5000.440 || ETA: 1:40:08 || timer: 0.527

Computing validation mAP (this may take a while)...

Calculating mAP...

       |  all  |  .50  |  .55  |  .60  |  .65  |  .70  |  .75  |  .80  |  .85  |  .90  |  .95  |
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
   box |  0.01 |  0.01 |  0.01 |  0.01 |  0.01 |  0.01 |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |
  mask |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |  0.00 |
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+

[ 12]      60 || B: 9.434 | C: 11.448 | M: 5052.604 | S: 9.178 | T: 5082.664 || ETA: 1:55:45 || timer: 0.512
[ 14]      70 || B: 9.276 | C: 11.130 | M: 5299.721 | S: 7.977 | T: 5328.105 || ETA: 1:53:20 || timer: 0.517
[ 16]      80 || B: 9.086 | C: 10.782 | M: 5407.029 | S: 7.096 | T: 5433.993 || ETA: 1:51:28 || timer: 0.515
Unhandled exception:  Traceback (most recent call last):
  File "/home/shahab/Projects/AI_dev_scripts/run_train.py", line 37, in <module>
    trained_model.train()
  File "/home/shahab/Projects/AI_dev_scripts/train/pytorch/yolact/train.py", line 168, in train
    losses = criterion(out, wrapper, wrapper.make_mask())
  File "/home/shahab/anaconda3/envs/unleashlive_cli/lib/python3.7/site-packages/torch-1.3.1-py3.7-linux-x86_64.egg/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/shahab/anaconda3/envs/unleashlive_cli/lib/python3.7/site-packages/torch-1.3.1-py3.7-linux-x86_64.egg/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/shahab/anaconda3/envs/unleashlive_cli/lib/python3.7/site-packages/torch-1.3.1-py3.7-linux-x86_64.egg/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/shahab/Projects/AI_dev_scripts/train/pytorch/yolact/layers/modules/multibox_loss.py", line 161, in forward
    losses['M'] = self.direct_mask_loss(pos_idx, idx_t, loc_data, mask_data, priors, masks)
  File "/home/shahab/Projects/AI_dev_scripts/train/pytorch/yolact/layers/modules/multibox_loss.py", line 398, in direct_mask_loss
    new_mask = F.adaptive_avg_pool2d(tmp_mask.unsqueeze(0), cfg.mask_size)
  File "/home/shahab/anaconda3/envs/unleashlive_cli/lib/python3.7/site-packages/torch-1.3.1-py3.7-linux-x86_64.egg/torch/nn/functional.py", line 768, in adaptive_avg_pool2d
    return torch._C._nn.adaptive_avg_pool2d(input, _output_size)
RuntimeError: adaptive_avg_pooling2d(): expected input to have non-empty spatial dimensions, but input has sizes [1, 0, 200] with dimension 1 being empty

Process finished with exit code 0

Where do you think the problem is? Thank you in advance.

abhigoku10 commented 4 years ago

@shahabe can you please check your annotations since the mask loss is very high compared to box loss

shahabe commented 4 years ago

@abhigoku10 Thank you for your reply. I checked the annotations and they are alright. Actually this error happens randomly in differnt iterations.

abhigoku10 commented 4 years ago

@shahabe for me the exploding of mask loss values occurs due to wrong annotations in my training and validation because ur box loss values are correct so

shahabe commented 4 years ago

Would you please elaborate how your masks and annotations were wrong? When I check the annotations on the images visually, I don't see any problem.

dbolya commented 4 years ago

Yeah that mask loss looks very suspect. Are your masks in polygon or RLE form? Both are fine, but could it be that your polygon is being interpreted as an RLE or visa versa?

When you visualize the masks, are you visualizing them externally or within YOLACT itself?

shahabe commented 4 years ago

I use polygon masks and visualize them externally. Everything is fine when I see it. I even tried to give onle one image for training to just test the training process and I got the same error.

How should I visualize the masks in yolact? Thank you.

abhigoku10 commented 4 years ago

@shahabe few of the time when you see the viz of the masks its properly but internally due to one of the format the mask polygon region will be bleeded into other class suggest you to look into the formation of the labels and conversion to coco format

dbolya commented 4 years ago

@shahabe On this line: https://github.com/dbolya/yolact/blob/db81124874817895db69f2dc443f5c24e0e3f491/data/coco.py#L176 you should just be able to add:

import matplotlib.pyplot as plt
plt.imshow(img) # The colors will be weird, don't worry about that
plt.show()
for i in range(masks.shape[0]):
    plt.imshow(masks[i])
    plt.show()

Then run training with --num_workers=0.

dbolya / yolact

Custom Data Validation #287