About mistakes in the implementation of SMA - Githubissues

HazyResearch / zoology

Understand and test language model architectures on synthetic tasks.

Apache License 2.0

163 stars 28 forks source link

About mistakes in the implementation of SMA #18

Open renll opened 9 months ago

renll commented 9 months ago

Thanks for the great work. I think the implementation of SMA has some mistakes.

The original SMA takes the logarithm during initialization and the exponential during forward path to ensure that the temperature scaler is larger than zero: https://github.com/renll/SeqBoat/blob/5a34aed3c573858630ec64b0f2c4a22948962947/fairseq/modules/seqboat_utils.py#L166

But it seems that this trick is lacked in the zoology implementation: https://github.com/HazyResearch/zoology/blob/dd43c72fe455fedf283cd73e037addc0e0d3be03/zoology/mixers/selective.py#L209

The original SMA selects the inputs tokens for the attention module: https://github.com/renll/SeqBoat/blob/5a34aed3c573858630ec64b0f2c4a22948962947/fairseq/modules/seqboat_unit.py#L603 but the Zoology one only selects the output: https://github.com/HazyResearch/zoology/blob/de4e258784224e09909c257ff3ea040f089ed660/zoology/mixers/selective.py#L245