e-p-armstrong / augmentoolkit

Convert Compute And Books Into Instruct-Tuning Datasets! Makes: QA, RP, Classifiers.
MIT License
922 stars 123 forks source link

questions about use #11

Closed pw136 closed 4 months ago

pw136 commented 5 months ago

Is it possible to support the conversion of Chinese text?

Ce-daros commented 5 months ago

Same problem. I can adjust the prompt template but the tokenizer...?

e-p-armstrong commented 5 months ago

Multilingual support depends largely on the model you use. The tokenizer is used purely for deciding how large chunks should be -- it does not actually relate to the model used to generate the outputs. If you use a model good with Chinese text and translate the prompts, it should work fine, in theory. Let me know how it works out! I'm curious about this use case myself.

e-p-armstrong commented 4 months ago

Closing due to inactivity