formermagic / formerbox

MIT License
1 stars 0 forks source link

Tokenizers restructure, Bart + GPT2 + Roberta tokenizers #23

Closed mozharovsky closed 4 years ago

mozharovsky commented 4 years ago

Summary

This PR reorganizes the tokenizers into formerbox/data/tokenizers module (thus reducing the risk of getting into circular imports issue). We also introduced GPT2 and BART tokenizers (alongside with present RoBERTa tokenizer) with associated tokenizer trainers.