Closed 17521121 closed 4 years ago
I saw in paper said that you use the full Transformer architecture (Vaswani et al., 2017), but you know transformers had many architectures right now, GPT2, Bert, Robert, ... and each architecture has its own tasks.
Yes - for generation we used the original Transformer architecture from Vaswani et al, as opposed to those other newer architectures.
I saw in paper said that you use the full Transformer architecture (Vaswani et al., 2017), but you know transformers had many architectures right now, GPT2, Bert, Robert, ... and each architecture has its own tasks.