I looked through the code and I can't find in the implementation the computation expressed in formula (3) in the paper, that is pi. Specifically, I expect to see a dot product between normalized prototypes and hidden features.
I'd expect them in this part, before cross entropy:
First of all: a very interesting work!
I looked through the code and I can't find in the implementation the computation expressed in formula (3) in the paper, that is pi. Specifically, I expect to see a dot product between normalized prototypes and hidden features.
I'd expect them in this part, before cross entropy:
Instead, it seems like the logits are directly computed from the base network:
I'd very much appreciate a clarification.