AlexisTercero55 / AI-Research

AI Data Science, research and development
MIT License
0 stars 0 forks source link

ML-DevOps | Crete a train pipeline for word2vec #11

Closed AlexisTercero55 closed 6 months ago

AlexisTercero55 commented 6 months ago
AlexisTercero55 commented 6 months ago

Attention is all you need dataset wmt 2014 english-german dataset

Dataset: https://nlp.stanford.edu/projects/nmt/

from the paper:

Sentences were encoded using byte-pair encoding.

see Massive Exploration of Neural Machine Translation Architectures

AlexisTercero55 commented 6 months ago

Conclusions

Due byte-pair encoding needed for train transformer model we split this task into another task: which will achieve the required encoding:

12