Closed PeterBishop0 closed 3 years ago
In baseline model ,class Attention use x = v * q ,however, in BUTD paper ,it use the feature vector v concatenated with the question embedding q Maybe I misunderstood it , but it would be nice of you to explain my confusion !THANKS!
HADAMARD PRODUCT FOR LOW-RANK BILINEAR POOLING Did you implement this one instead?
Only the attention part implement LOW-RANK BILINEAR POOLING IN ATTENTION MECHANISM
In baseline model ,class Attention use x = v * q ,however, in BUTD paper ,it use the feature vector v concatenated with the question embedding q Maybe I misunderstood it , but it would be nice of you to explain my confusion !THANKS!