allenai / vampire

Variational Methods for Pretraining in Resource-limited Environments
Apache License 2.0
174 stars 33 forks source link

Wordpiece tokenization #55

Open kernelmachine opened 4 years ago

kernelmachine commented 4 years ago

This PR addresses #52 , adding wordpiece tokenization tools from huggingface's tokenizers.