Atcold / NYU-DLSP20

NYU Deep Learning Spring 2020
https://atcold.github.io/NYU-DLSP20/
Other
6.66k stars 2.22k forks source link

cross attention issue, topic_12.3. Attention and the Transformer #838

Open omniaalwazzan opened 1 year ago

omniaalwazzan commented 1 year ago

Thanks a lot for this valuable repository

I have a question regarding cross attention presented in the topic 12.3. Attention and the Transformer.

Is this line working properly d_xq, d_xk, d_xv = d_input

In my case, it doesn't work.

Thanks, Omnia

Atcold commented 1 year ago

What does it mean "it doesn't work"? Is it issuing some error to you?

omniaalwazzan commented 1 year ago

Thanks a lot for replying @Atcold. I am sorry I was quite busy with some deadlines for my PhD.

Sorry for the confusion, you are right, I should have mentioned the error, so the error is in this line: d_xq, d_xk, d_xv = d_input TypeError: cannot unpack non-iterable int object

When I pass d_input to use cross-attention.

As far as I understand when we want to initialize or assign a value/integer to multiple variables we need unpacked these values as the int object has no iter method, so maybe to fix that line causing the error, we do this instead: d_xq, d_xk, d_xv = d_input, d_input, d_input

However, when I do the mentioned solution, I get the same result of self-attention, is this possible? I think the result should be slightly different according to the cross-attention definition. Please feel free to correct me at any point, I have far less experience than you :)

Atcold commented 1 year ago

d_input is a tuple of 3 elements specifying the dimension d for query, key, and values. This functionality is not used in 15-transformer.ipynb, so I'm confused about where you're getting the error. Could you explain to me what you're doing so I can help you?