Where exactly is the implementation of the CAN transformer?

Lollipop321 / compressed-attention

Other

2 stars 1 forks source link

Where exactly is the implementation of the CAN transformer? #1

Open yuvaraj91 opened 3 years ago

yuvaraj91 commented 3 years ago

I was looking through the file https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/can_transformer.py and https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/transformer.py (which I assume is the base model from Fairseq?) , and I would like to know where actually is the implementation of the CAN methodology?

I was checking this part here, https://github.com/Lollipop321/compressed-attention/blob/0746b2687a2c00cb860e62980f81ff460fb0f3dd/fairseq/models/can_transformer.py#L645 but I could only see that the parameters are same as the baseline model?

Lollipop321 commented 3 years ago

I was looking through the file https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/can_transformer.py and https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/transformer.py (which I assume is the base model from Fairseq?) , and I would like to know where actually is the implementation of the CAN methodology?

I was checking this part here,

https://github.com/Lollipop321/compressed-attention/blob/0746b2687a2c00cb860e62980f81ff460fb0f3dd/fairseq/models/can_transformer.py#L645 but I could only see that the parameters are same as the baseline model?

Hi！ The implementation of the CAN methodology can be seen at line 524 in the file https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/can_transformer.py , the parameters can be found at line 673.

yuvaraj91 commented 3 years ago

Thanks for your reply. Do you mean specifically at this line? https://github.com/Lollipop321/compressed-attention/blob/0746b2687a2c00cb860e62980f81ff460fb0f3dd/fairseq/models/can_transformer.py#L552

If I am not mistaken, the compressedsublayer replaces the MultiheadAttention in the base Transformer decoder layer?

yuvaraj91 commented 3 years ago

@Lollipop321 any idea on this please?