OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

refactor(model): ```get_dataset``` from disk and efficent tokenizer #200

Closed boss-chanon closed 1 year ago

boss-chanon commented 1 year ago

Why this PR

load dataset from local and efficent tokenizer

Changes

Related Issues

Close #

Checklist