google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.33k stars 352 forks source link

Train electra with another tokenizer #127

Closed upskyy closed 3 years ago

upskyy commented 3 years ago

Question

Hello! Thanks for providing a good repo.

I think that the official electra code uses wordpiece tokenizer. but, is it possible to train using another tokenizer?

PhilipMay commented 3 years ago

Question

Hello! Thanks for providing a good repo.

I think that the official electra code uses wordpiece tokenizer. but, is it possible to train using another tokenizer?

It should be possible. The model just expects integers...

upskyy commented 3 years ago

I checked the model and it was possible. Thank you for the reply.