Implement Test Time Augmentation

JoshVarty commented 5 years ago

We need to scan the image from top to bottom and output all bounding boxes within it. We can probably use test time augmentation and a similar approach as we used in the previous competition.

JoshVarty commented 5 years ago

We could investigate the show_results_side_by_side() method as we have to take the outputs and run nms() on the predictions.

JoshVarty commented 5 years ago

Nikhil pointed out that we should be able to generate predictions in a simpler way. Since we're not doing classification and don't have any fully connected layers, we should be able to pass arbitrary sized tensors through our network and we'll just receive a correspondingly sized output which we'll convert to bounding boxes and run through NMS.

Unfortunately I'm getting an error when I used the RetinaNet from the ObjectDetection repository.

The exception:

Exception has occurred: RuntimeError
expected a non-empty list of Tensors
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 52, in _apply_transpose
    [func(p).permute(0, 2, 3, 1).contiguous().view(p.size(0), -1, n_classes) for p in p_states], 1)
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 81, in forward
    return [self._apply_transpose(self.classifier, p_states, self.n_classes),
  File "/home/josh/git/SorghumHeadDetection/main.py", line 34, in show_test_results
    z = learn.model.eval()(x)
  File "/home/josh/git/SorghumHeadDetection/main.py", line 102, in <module>
    show_test_results(tlearn, oimg=img, classes=tlearn.data.classes, start=0, detect_thresh=0.70, nms_thresh=0.35)

JoshVarty commented 5 years ago

We have three inputs to _apply_transform():

`func`

Sequential(
  (0): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
  )
  (1): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
  )
  (2): Conv2d(32, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)

`p_states`

Empty (Probably the problem)
Created in forward()
- p_states = [self.c5top5(c5.clone()), self.c5top6(c5)]
- Contains torch.Size([1, 32, 39, 10]) and torch.Size([1, 32, 20, 5])
Gets new element
- p_states.append(self.p6top7(p_states[-1]))
- torch.Size([1, 32, 10, 3])
Goes through two LateralUpsampleMerges
- First
- Conv of hook.stored=torch.Size([1, 256, 78, 20]) and x=torch.Size([1, 32, 39, 10])
Creates NEW p_states
- p_states = [p_state for p_state in p_states if p_state.size()[-1] in self.sizes]
- Originally contained 5 elements
- torch.Size([1, 32, 156, 40])
- torch.Size([1, 32, 78, 20])
- torch.Size([1, 32, 39, 10])
- torch.Size([1, 32, 20, 5])
- torch.Size([1, 32, 10, 3])
- Ends with empty p_states

When it works (256x256 input)

Created in forward()
- p_states = [self.c5top5(c5.clone()), self.c5top6(c5)]
- Contains torch.Size([1, 32, 8, 8]) and torch.Size([1, 32, 4, 4])
Gets new element
- p_states.append(self.p6top7(p_states[-1]))
- torch.Size([1, 32, 2, 2])
Goes through two LateralUpsampleMerges
Creates NEW p_states
- p_states = [p_state for p_state in p_states if p_state.size()[-1] in self.sizes]
- Originally contained 5 elements
- torch.Size([1, 32, 32, 32])
- torch.Size([1, 32, 16, 16])
- torch.Size([1, 32, 8, 8])
- torch.Size([1, 32, 4, 4])
- torch.Size([1, 32, 2, 2])
- Ends with 4 elements
- torch.Size([1, 32, 16, 16])
- torch.Size([1, 32, 8, 8])
- torch.Size([1, 32, 4, 4])
- torch.Size([1, 32, 2, 2])

`n_classes`

Value of 2

JoshVarty commented 5 years ago

So after not filtering out p_states, our network no longer returns empty p_states.

On some images we can successfully generate bounding boxes, on others we get an error similar to:

The size of tensor a (69) must match the size of tensor b (70) at non-singleton dimension 2

Exception has occurred: RuntimeError
The size of tensor a (69) must match the size of tensor b (70) at non-singleton dimension 2
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 15, in forward
    return self.conv_lat(self.hook.stored) + F.interpolate(x, scale_factor=2)
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 76, in forward
    p_states = [merge(p_states[0])] + p_states
  File "/home/josh/git/SorghumHeadDetection/main.py", line 34, in show_test_results
    z = learn.model.eval()(x)
  File "/home/josh/git/SorghumHeadDetection/main.py", line 102, in <module>
    show_test_results(tlearn, oimg=img, classes=tlearn.data.classes, start=0, detect_thresh=0.70, nms_thresh=0.35)

The error appears to be an off-by-one error that happens when the height dimension on one end of a lateral upsample merge does not match the height dimension of the other end.

Image on which it works:

train_labelled_images/C36-R6-G276-DSC00775.jpeg
torch.Size([3, 1243, 316])

Image on which it doesn't work:

train_labelled_images/C2-R34-G62-DSC01702.jpeg
torch.Size([3, 1103, 287])

The next step is probably to look at the shapes in the encoder and shapes in the decoder. We might even be able to do this with with self._model_sizes(encoder, size=imsize)

For imsize=(256,256) (works) we get:

0:torch.Size([1, 64, 128, 128])
1:torch.Size([1, 64, 128, 128])
2:torch.Size([1, 64, 128, 128])
3:torch.Size([1, 64, 64, 64])
4:torch.Size([1, 64, 64, 64])
5:torch.Size([1, 128, 32, 32]) **** MERGE
6:torch.Size([1, 256, 16, 16]) **** MERGE
7:torch.Size([1, 512, 8, 8])

For imsize=(1243, 316) (works) we get:

0:torch.Size([1, 64, 622, 158])
1:torch.Size([1, 64, 622, 158])
2:torch.Size([1, 64, 622, 158])
3:torch.Size([1, 64, 311, 79])
4:torch.Size([1, 64, 311, 79])
5:torch.Size([1, 128, 156, 40]) **** MERGE
6:torch.Size([1, 256, 78, 20]) **** MERGE
7:torch.Size([1, 512, 39, 10])

For imsize=(1103, 287) (broken) we get:

0:torch.Size([1, 64, 552, 144])
1:torch.Size([1, 64, 552, 144])
2:torch.Size([1, 64, 552, 144])
3:torch.Size([1, 64, 276, 72])
4:torch.Size([1, 64, 276, 72])
5:torch.Size([1, 128, 138, 36]) **** MERGE
6:torch.Size([1, 256, 69, 18]) *** MERGE
7:torch.Size([1, 512, 35, 9])

JoshVarty / SorghumHeadDetection