Closed d12306 closed 5 years ago
Hi,
As mentioned in the paper, FEAT and FEAT* use different regularizers.
Since the specific test instance is incorporated in the transformer adaptation in FEAT*, which provides a kind of supervision to regularize the attention. For example, the labels of the transformer input are in the form of (a,b,c,d,e,c) when we have a 5-way 1-shot support set and the test instance comes from class c. In this case, we want to regularizer an input attended to its same class instances. In other words, we want the attention of the 3rd instance concentrated on itself and the test instance.
@Han-Jia , Thanks, but I think the there is always some 1 in the attention label matrix, can they be substitued by 0.5? because they actually denote the embeddings belong to the same class,just as the same as the 0.5 means.
Hello, The att_label_basis is constructed by setting part of the identity matrix to zero, so it contains 1. You can also build this matrix by matching the labels of the test instance and the support set.
Thank you so much. Appreciate your help.
Hello, @Han-Jia , Thanks for your implementation, but when you calculate the KL divergence, why do you formalize the attention labels like that? Why this implementation can reflect the property of the contrastive loss proposed in the paper?
Could you please explain it a little bit more, or is there any reference I can have a look at?
Thanks,