allenai / vampire

Variational Methods for Pretraining in Resource-limited Environments
Apache License 2.0
174 stars 33 forks source link

Add wordpiece tokenization tools #52

Open kernelmachine opened 4 years ago

kernelmachine commented 4 years ago

Add wordpiece tokenization tools to repository, to reduce overall vocabulary size and improve training speed.