Closed ddlBoJack closed 2 years ago
And I found that you set ema_layers_only: false
in the released model, which means you do ema for the whole transformer including position encoding. Is this setting a better one than the paper claims (share the position encoding)? Thanks a lot.
And I found that you set
ema_layers_only: false
in the released model, which means you do ema for the whole transformer including position encoding. Is this setting a better one than the paper claims (share the position encoding)? Thanks a lot.
did you come up with an answer for this?
And I found that you set
ema_layers_only: false
in the released model, which means you do ema for the whole transformer including position encoding. Is this setting a better one than the paper claims (share the position encoding)? Thanks a lot.did you come up with an answer for this?
refer this: https://github.com/facebookresearch/fairseq/issues/4342
Thanks! So ema_layers_only: true
is the way to go.
Hi, I am trying to reproduce data2vec on speech.
I found that the config of the released model you provided is inconsistent with what is stated in the documentation. And some fields are not declared in the code, such as "diversity_weight". So can you please provide a new config or update the code? Many Thanks!
@alexeib