Open code-pan94 opened 1 month ago
emb_dim
must be divisible by num_head
as required by multi-head attention. The default value of num_head
is 3. If you want to use emb_dim
as 512, you need to change num_head
to some value that is divisible, for example 8.
Hello,
Thank you for sharing such a wonderful code.
I found in my experiments that the parameter of emb_dim that refers to the embedding dimension should be always a multiple of 192. I do not understand why as the other codes allow other values such as 512 and 256.
Is there a solution to enable such dimensions.
Thank a lot =)