PyThaiNLP / pythainlp

Thai Natural Language Processing in Python.
https://pythainlp.org/
Apache License 2.0
936 stars 272 forks source link

Add new engine to chat/generate features #895

Open pavaris-pm opened 6 months ago

pavaris-pm commented 6 months ago

In a couple days before, I've seen that we also have a chat/generate features that utilize wangchanglm as a current LLM model for text generation ability. Moreover, there has an upcoming LLM known as Typhoon-7b from scb10x which bring a wow factor into Thai LLM with evaluation on Thai examination task. Due to this new wave of Thai LLM, do we need to add Typhoon-7b as an optional engine of PyThaiNLP? What do you think?

Ps. I'm not sure that it will produce inappropriate word or not since they claim that it has no moderation mechanism. Maybe I can fine-tune it with some samples (e.g. 1k text samples) in order to adjust their mood and tone for more appropriate generation as well. You can suggest.

wannaphong commented 6 months ago

Yes, I agree.

Typhoon-7b are bilingual llm, so I think if somenoe train instruct fellow by English, It should can working with Thai too!.

I am welcome if the model doesn't use the data from ChatGPT (example ShareGPT, self-instruct that use ChatGPT data for create the dataset).

pavaris-pm commented 6 months ago

Yes, I agree.

Typhoon-7b are bilingual llm, so I think if somenoe train instruct fellow by English, It should can working with Thai too!.

I am welcome if the model doesn't use the data from ChatGPT (example ShareGPT, self-instruct that use ChatGPT data for create the dataset).

Already set in the goal! you can wait for my upcoming PR krub.