cs231n / cs231n.github.io

Public facing notes page
MIT License
10.06k stars 4.06k forks source link

2021 assignment 3 Q2 self attention section: the expected_self_attn_output provided is wrong #278

Open manuka2 opened 2 years ago

manuka2 commented 2 years ago

In the forward() for MultiHeadAttention class in assignment3/cs231n/transformer_layers.py
People can only get the provided expected_self_attn_output if people do attention weights --- dropout --- attention weights after dropout X value matrix. However, your assignment instruction explicitly instructed people to follow a different order, namely, attention weights --- attention weights X value matrix --- dropout. If people follow the order you actually instructed, their self_attn_output will be different from the provided expected_self_attn_output. So the check you provided in your Transformer_Captioning.ipynb is wrong.

taylover2016 commented 2 years ago

Hi there! I'm working on the same problem these days and it just drives me nuts. Previously I was not quite familiar with the way Transformer works. Yesterday I read the supplementary materials and revisited the problem, but the error is still 1.0 ... There has not been any solution for reference so I wonder if you could help me, maybe?

taylover2016 commented 2 years ago

Btw, I'm not a student taking the class right now lol. So there's no need to worry about things like Honor Code. I'm just watching the 2017 videos and doing the 2022 assignments.

tyjcbzd commented 1 year ago

Hi! I have the same problem! Have you solved it?