Things I don't understand

JoshVarty commented 5 years ago

There's a lot I don't understand about the current implementation so this will act as a notepad while I try to figure out bits and pieces.

How does create_anchor work? How do they come up with sizes, scales and ratios?
How can I make smaller anchors?
Which pyramid levels are we taking outputs at?
How can I change n_anchors?
How is sizes created/calculated?
What is this flatten stuff for?
How does activ_to_bbox() work?
How does nms() work? What is this start parameter in show_results()
What are smoothers?

JoshVarty commented 5 years ago

`create_anchors`

Defined as:

def create_anchors(sizes, ratios, scales, flatten=True):
    "Create anchor of `sizes`, `ratios` and `scales`."
    aspects = [[[s*math.sqrt(r), s*math.sqrt(1/r)] for s in scales] for r in ratios]
    aspects = torch.tensor(aspects).view(-1,2)
    anchors = []
    for h,w in sizes:
        #4 here to have the anchors overlap.
        sized_aspects = 4 * (aspects * torch.tensor([2/h,2/w])).unsqueeze(0)
        base_grid = create_grid((h,w)).unsqueeze(1)
        n,a = base_grid.size(0),aspects.size(0)
        ancs = torch.cat([base_grid.expand(n,a,2), sized_aspects.expand(n,a,2)], 2)
        anchors.append(ancs.view(h,w,a,4))
    return torch.cat([anc.view(-1,4) for anc in anchors],0) if flatten else anchors

Called with:

ratios = [1/2,1,2]
scales = [1,2**(-1/3), 2**(-2/3)] 
#Paper used [1,2**(1/3), 2**(2/3)] but a bigger size (600) too, so the largest feature map gave anchors that cover less of the image.
sizes = [(2**i,2**i) for i in range(5)]
sizes.reverse() #Predictions come in the order of the smallest feature map to the biggest
anchors = create_anchors(sizes, ratios, scales)

or:

self.anchors = create_anchors(sizes, self.ratios, self.scales).to(device)

ratios seems to control the size to some degree, but I think it's supposed to be tied to the sizes thing.
scales seems to control how much we stretch/squish the anchor
The example uses sizes of [(16, 16), (8, 8), (4, 4), (2, 2), (1, 1)]

16*16*9 + 8*8*9 + 4*4*9 + 2*2*9 + 1*1*9 = 3069

anchors.shape = [3069,4]

Rewriting loop for aspects to:

aspects = []
for r in ratios:
  for s in scales:
    current = s * math.sqrt(r), s * math.sqrt(1/r)
    aspects.append(current)

Gives us aspects of:

[(0.7071067811865476, 1.4142135623730951), 
(0.5612310241546866, 1.1224620483093732), 
(0.4454493590701697, 0.8908987181403394), 
(1.0, 1.0), 
(0.7937005259840998, 0.7937005259840998), 
(0.6299605249474366, 0.6299605249474366), 
(1.4142135623730951, 0.7071067811865476), 
(1.1224620483093732, 0.5612310241546866),
 (0.8908987181403394, 0.4454493590701697)]

So this appears to be where num_anchors=9 comes from?

JoshVarty commented 5 years ago

How can I make smaller anchors?

Our anchors are too big. Also #5

I think there might be a couple ways to do this. For starters we can play with scale but I think this only gets us so far.

I'm thinking that the layer of the pyramid from which we take our predictions governs the size of the output anchors. I also think that we're not using the lower layers of the pyramid which means we aren't good at handling small objects.

We should try hooking up to lower levels of the pyramid (larger sizes) to try to get more finegrained predictions.

JoshVarty commented 5 years ago

What pyramid levels are we using?

The sizes appear to be:

[[16, 16], [32, 32], [4, 4], [2, 2], [1, 1]] which seems super high up the stack. We should look closer at how the connections are built.

JoshVarty / SorghumHeadDetection