Closed PaulLerner closed 2 years ago
Hi!
In our controlled setup, we used 5 locations for all the models to do apples-to-apples comparisons.
The corresponding locations are the normalised top/left/bottom/right coordinates, and area. You can see their computations here.
Very clear, thanks!
Hi,
I notice that you simply project the object location here https://github.com/e-bug/volta/blob/main/volta/embeddings.py#L495 and that you set the object location dimension to 5 there https://github.com/e-bug/volta/blob/main/config/ctrl_uniter_base.json#L16
How exactly do you represent the location of the object? Chen et al. say they use a 7 dimensional vector: [x_1, y_1, x_2, y_2, w, h, w ∗ h] (normalized top/left/bottom/right coordinates, width, height, and area.) They hard-code it: https://github.com/ChenRocks/UNITER/blob/master/model/model.py#L254
Bests,
Paul