Closed aimanmutasem closed 3 years ago
Hi -- I have never reproduced Due et al's results, so I can't comment on how to implement their models. Moreover, my ability to study, understand, review, and debug other people's code is rather limited. In any case, it appears you're trying to do something new, so you may want to approach the exercise with an experimental mindset. I would recommend reading Andrej Karpathy's "how-to" recipe for training neural networks, which has lots of tips and tricks: http://karpathy.github.io/2019/04/25/recipe/
Dear Prof. Heinsen,
Good day.
Firstly, I so sorry to open a new issue again.
I'm trying to apply Capsule network for Neural Machine Translation (NMT) task based on Multi-head attention network since June last year. Motivated by a previous study in AAAI2019, authors applied the old version of CapsNet to route the values (output) of the multi-head attention layer, dynamicly. Unfortunately, I'm facing some logical difficulties and I don't get the expected results.
In my project, when I applied Capsole network with encoder and decoder specifically for multi-head attention layer, I got so bad results. Moreover, CapsNet with the only encoder doesn't make any improvments.
Hope you can give me some tips to apply CapsNet with Multi-head attention layer, correctly.
Sample of code :
a
for activation Amu
fo main µ