JoshVarty / SorghumHeadDetection

Working on: https://competitions.codalab.org/competitions/23177
MIT License
3 stars 2 forks source link

Implement Test Time Augmentation #3

Open JoshVarty opened 5 years ago

JoshVarty commented 5 years ago

We need to scan the image from top to bottom and output all bounding boxes within it. We can probably use test time augmentation and a similar approach as we used in the previous competition.

JoshVarty commented 5 years ago

We could investigate the show_results_side_by_side() method as we have to take the outputs and run nms() on the predictions.

JoshVarty commented 5 years ago

Nikhil pointed out that we should be able to generate predictions in a simpler way. Since we're not doing classification and don't have any fully connected layers, we should be able to pass arbitrary sized tensors through our network and we'll just receive a correspondingly sized output which we'll convert to bounding boxes and run through NMS.

Unfortunately I'm getting an error when I used the RetinaNet from the ObjectDetection repository.

The exception:

Exception has occurred: RuntimeError
expected a non-empty list of Tensors
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 52, in _apply_transpose
    [func(p).permute(0, 2, 3, 1).contiguous().view(p.size(0), -1, n_classes) for p in p_states], 1)
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 81, in forward
    return [self._apply_transpose(self.classifier, p_states, self.n_classes),
  File "/home/josh/git/SorghumHeadDetection/main.py", line 34, in show_test_results
    z = learn.model.eval()(x)
  File "/home/josh/git/SorghumHeadDetection/main.py", line 102, in <module>
    show_test_results(tlearn, oimg=img, classes=tlearn.data.classes, start=0, detect_thresh=0.70, nms_thresh=0.35)
JoshVarty commented 5 years ago

We have three inputs to _apply_transform():

func

Sequential(
  (0): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
  )
  (1): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
  )
  (2): Conv2d(32, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)

p_states

When it works (256x256 input)

n_classes

JoshVarty commented 5 years ago

So after not filtering out p_states, our network no longer returns empty p_states.

On some images we can successfully generate bounding boxes, on others we get an error similar to:

The size of tensor a (69) must match the size of tensor b (70) at non-singleton dimension 2

Exception has occurred: RuntimeError
The size of tensor a (69) must match the size of tensor b (70) at non-singleton dimension 2
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 15, in forward
    return self.conv_lat(self.hook.stored) + F.interpolate(x, scale_factor=2)
  File "/home/josh/git/SorghumHeadDetection/RetinaNet/RetinaNet.py", line 76, in forward
    p_states = [merge(p_states[0])] + p_states
  File "/home/josh/git/SorghumHeadDetection/main.py", line 34, in show_test_results
    z = learn.model.eval()(x)
  File "/home/josh/git/SorghumHeadDetection/main.py", line 102, in <module>
    show_test_results(tlearn, oimg=img, classes=tlearn.data.classes, start=0, detect_thresh=0.70, nms_thresh=0.35)

The error appears to be an off-by-one error that happens when the height dimension on one end of a lateral upsample merge does not match the height dimension of the other end.

Image on which it works:

train_labelled_images/C36-R6-G276-DSC00775.jpeg
torch.Size([3, 1243, 316])

Image on which it doesn't work:

train_labelled_images/C2-R34-G62-DSC01702.jpeg
torch.Size([3, 1103, 287])

The next step is probably to look at the shapes in the encoder and shapes in the decoder. We might even be able to do this with with self._model_sizes(encoder, size=imsize)

For imsize=(256,256) (works) we get:

For imsize=(1243, 316) (works) we get:

For imsize=(1103, 287) (broken) we get: