Open manuka2 opened 2 years ago
Hi there! I'm working on the same problem these days and it just drives me nuts. Previously I was not quite familiar with the way Transformer works. Yesterday I read the supplementary materials and revisited the problem, but the error is still 1.0 ... There has not been any solution for reference so I wonder if you could help me, maybe?
Btw, I'm not a student taking the class right now lol. So there's no need to worry about things like Honor Code. I'm just watching the 2017 videos and doing the 2022 assignments.
Hi! I have the same problem! Have you solved it?
In the
forward()
forMultiHeadAttention
class inassignment3/cs231n/transformer_layers.py
People can only get the provided
expected_self_attn_output
if people doattention weights --- dropout --- attention weights after dropout X value matrix
. However, your assignment instruction explicitly instructed people to follow a different order, namely,attention weights --- attention weights X value matrix --- dropout
. If people follow the order you actually instructed, theirself_attn_output
will be different from the providedexpected_self_attn_output
. So the check you provided in your Transformer_Captioning.ipynb is wrong.