Open yuvaraj91 opened 3 years ago
I was looking through the file https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/can_transformer.py and https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/transformer.py (which I assume is the base model from Fairseq?) , and I would like to know where actually is the implementation of the CAN methodology?
I was checking this part here,
https://github.com/Lollipop321/compressed-attention/blob/0746b2687a2c00cb860e62980f81ff460fb0f3dd/fairseq/models/can_transformer.py#L645 but I could only see that the parameters are same as the baseline model?
Hi! The implementation of the CAN methodology can be seen at line 524 in the file https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/can_transformer.py , the parameters can be found at line 673.
Thanks for your reply. Do you mean specifically at this line? https://github.com/Lollipop321/compressed-attention/blob/0746b2687a2c00cb860e62980f81ff460fb0f3dd/fairseq/models/can_transformer.py#L552
If I am not mistaken, the compressedsublayer
replaces the MultiheadAttention
in the base Transformer decoder layer?
@Lollipop321 any idea on this please?
I was looking through the file https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/can_transformer.py and https://github.com/Lollipop321/compressed-attention/blob/main/fairseq/models/transformer.py (which I assume is the base model from Fairseq?) , and I would like to know where actually is the implementation of the CAN methodology?
I was checking this part here, https://github.com/Lollipop321/compressed-attention/blob/0746b2687a2c00cb860e62980f81ff460fb0f3dd/fairseq/models/can_transformer.py#L645 but I could only see that the parameters are same as the baseline model?