Open JoshVarty opened 5 years ago
create_anchors
Defined as:
def create_anchors(sizes, ratios, scales, flatten=True):
"Create anchor of `sizes`, `ratios` and `scales`."
aspects = [[[s*math.sqrt(r), s*math.sqrt(1/r)] for s in scales] for r in ratios]
aspects = torch.tensor(aspects).view(-1,2)
anchors = []
for h,w in sizes:
#4 here to have the anchors overlap.
sized_aspects = 4 * (aspects * torch.tensor([2/h,2/w])).unsqueeze(0)
base_grid = create_grid((h,w)).unsqueeze(1)
n,a = base_grid.size(0),aspects.size(0)
ancs = torch.cat([base_grid.expand(n,a,2), sized_aspects.expand(n,a,2)], 2)
anchors.append(ancs.view(h,w,a,4))
return torch.cat([anc.view(-1,4) for anc in anchors],0) if flatten else anchors
Called with:
ratios = [1/2,1,2]
scales = [1,2**(-1/3), 2**(-2/3)]
#Paper used [1,2**(1/3), 2**(2/3)] but a bigger size (600) too, so the largest feature map gave anchors that cover less of the image.
sizes = [(2**i,2**i) for i in range(5)]
sizes.reverse() #Predictions come in the order of the smallest feature map to the biggest
anchors = create_anchors(sizes, ratios, scales)
or:
self.anchors = create_anchors(sizes, self.ratios, self.scales).to(device)
ratios
seems to control the size to some degree, but I think it's supposed to be tied to the sizes thing.scales
seems to control how much we stretch/squish the anchor[(16, 16), (8, 8), (4, 4), (2, 2), (1, 1)]
16*16*9 + 8*8*9 + 4*4*9 + 2*2*9 + 1*1*9 = 3069
anchors.shape = [3069,4]
Rewriting loop for aspects to:
aspects = []
for r in ratios:
for s in scales:
current = s * math.sqrt(r), s * math.sqrt(1/r)
aspects.append(current)
Gives us aspects of:
[(0.7071067811865476, 1.4142135623730951),
(0.5612310241546866, 1.1224620483093732),
(0.4454493590701697, 0.8908987181403394),
(1.0, 1.0),
(0.7937005259840998, 0.7937005259840998),
(0.6299605249474366, 0.6299605249474366),
(1.4142135623730951, 0.7071067811865476),
(1.1224620483093732, 0.5612310241546866),
(0.8908987181403394, 0.4454493590701697)]
So this appears to be where num_anchors=9
comes from?
Our anchors are too big. Also #5
I think there might be a couple ways to do this. For starters we can play with scale
but I think this only gets us so far.
I'm thinking that the layer of the pyramid from which we take our predictions governs the size of the output anchors. I also think that we're not using the lower layers of the pyramid which means we aren't good at handling small objects.
We should try hooking up to lower levels of the pyramid (larger sizes) to try to get more finegrained predictions.
The sizes appear to be:
[[16, 16], [32, 32], [4, 4], [2, 2], [1, 1]]
which seems super high up the stack. We should look closer at how the connections are built.
There's a lot I don't understand about the current implementation so this will act as a notepad while I try to figure out bits and pieces.
create_anchor
work? How do they come up withsizes
,scales
andratios
?n_anchors
?sizes
created/calculated?flatten
stuff for?activ_to_bbox()
work?nms()
work? What is thisstart
parameter inshow_results()
smoothers
?