Open AlexisTercero55 opened 6 months ago
The wmt 2014 english-german dataset contains two text files that contains 4.5M sentence pairs of English-German:
We tokenize and clean all datasets with the scripts in Moses and learn shared subword units using Byte Pair Encoding (BPE) (Sennrich et al., 2016b) using 32,000 merge operations for a final vocabulary size of approximately 37k. (Google Brain 2017)
Sentences were encoded using byte-pair encoding.
Similarly to other sequence transduction models, we use learned embeddings to convert the input tokens and output tokens to vectors of dimension $d_{\text{model}}$ . (Google 2017)
Acoording "Attention is all you need" paper the python implementation of transformer architecture is placed in this repo/script
FYI @xavierVG
Requirements
Originally posted by @AlexisTercero55 in #11