OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

refactor(model): gptj merge tokenizer #192

Closed boss-chanon closed 1 year ago

boss-chanon commented 1 year ago

Why this PR

refactor gpt merge tokenizer for customize tokenizer and make file for load and save to locaf tokenizer and merge file to local

Changes

Related Issues

Close #

Checklist

boss-chanon commented 1 year ago

I think you should move all of tokenizers (GPTJ tokenizer in this PR) to src/model/openthaigpt_pretraining_model/tokenizers directory. How do you think?

i think we could new pr to move src/model/openthaigpt_pretraining_model/GPTJ_TH_tokenizer and src/model/openthaigpt_pretraining_model/llama_thai_tokenizer to src/model/openthaigpt_pretraining_model/tokenizers