OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

feat(model): add ```train_extremely_large_corpus``` option to ```spm_train``` #193

Closed boss-chanon closed 1 year ago

boss-chanon commented 1 year ago

Why this PR

can use train_extremely_large_corpus for train tokenizer large corpus

Changes

Related Issues

Close #

Checklist