Hello,
it might be a silly question, but after a while I could not figure out what is wrong with my reading.
(QUESTION 1)
In model.model.py it is commented that the batch goes from (b, C, H, W) ---> (2b, C, H, W) after concatenating image and sketches.
Later on, the batch increases up to 4b after self-attention (see image).
However, a quick unitary test reveals that the self attention module does not modify the batch:
Outputs:
torch.Size([3, 197, 768]) [196, ..., 196] [None, ..., None]
I suspect that I do not understand well how the positive / negative pairs are being passed to the model, and the scarce comments on the code can be a bit cryptic.
(QUESTION 2)
Therefore, my second question is, given the pair (sk, im) how are possitive and negatives defined?
I think it is not entirely clear after inspection of the triplet loss function:
(QUESTION 3)
I assume the following line is aggregating local information from adjacent tokens:
Is this commented on the paper? Can't read it in the Relational Network section rather than only mentioning the MLP-Relu concatenation.
Thanks for your attention, and keep it up with the good work!
Hello, it might be a silly question, but after a while I could not figure out what is wrong with my reading.
(QUESTION 1) In
model.model.py
it is commented that the batch goes from (b, C, H, W) ---> (2b, C, H, W) after concatenating image and sketches.Later on, the batch increases up to 4b after self-attention (see image).
However, a quick unitary test reveals that the self attention module does not modify the batch: Outputs:
torch.Size([3, 197, 768]) [196, ..., 196] [None, ..., None]
I suspect that I do not understand well how the positive / negative pairs are being passed to the model, and the scarce comments on the code can be a bit cryptic.
(QUESTION 2) Therefore, my second question is, given the pair
(sk, im)
how are possitive and negatives defined?I think it is not entirely clear after inspection of the triplet loss function:
(QUESTION 3) I assume the following line is aggregating local information from adjacent tokens: Is this commented on the paper? Can't read it in the Relational Network section rather than only mentioning the MLP-Relu concatenation.
Thanks for your attention, and keep it up with the good work!