cmsflash / efficient-attention

An implementation of the efficient attention module.
https://arxiv.org/abs/1812.01243
MIT License
272 stars 26 forks source link

About module parameters #1

Closed TRillionZxY closed 3 years ago

TRillionZxY commented 3 years ago

Hi, it's a great job. I'm a rookie for DL.

The EfficientAttention have four parameters: in_channels, key_channels, head_count, value_channels. I know key_channels = d_k, value_channels = d_v. What are head_count means? What is a common setting for four parameters?

cmsflash commented 3 years ago

As README.md mentioned, head_count is an additional parameter that the paper did not explore. Setting head_count=1 would reproduce the paper's settings. in_channels depends on the number of channels for your input. The common setting for the rest of the parameters are key_channels=in_channels // 2, value_channels=in_channels.

TRillionZxY commented 3 years ago

Thank you!:)

horanyinora commented 3 years ago

When the head count is larger than one how do you adjust key_channels and value_channels?

cmsflash commented 3 years ago

When the head count is larger than one how do you adjust key_channels and value_channels?

No, you do not need to. The module automatically divides key_channels and value_channels by head_count and uses the resultant head_key_channels and value_key_channels for each head. You do need to ensure that key_channels and value_channels are divisible by head_count.