Closed amirjalaly closed 1 year ago
To add a language follow you simply need to translate the site. Here are a few pull requests that show how to do it.
https://github.com/LAION-AI/Open-Assistant/pull/1390/files https://github.com/LAION-AI/Open-Assistant/pull/2271/files https://github.com/LAION-AI/Open-Assistant/pull/2386/files
I mean adding a new language support to LLM not the site
I mean adding a new language support to LLM not the site
The two are equivalent. If you translate the site, OA will start collecting data in the new language and then the LLM could be tuned with that data in future.
I think that amount of data is not enough. For LLM to understand farsi, It needs to see at least 10GB text in Persian which is completely available on Wikipedia. Are there any plans to officially support farsi?
If you have data in farsi you can add an import script in the data folder: https://github.com/LAION-AI/Open-Assistant/tree/main/data/datasets
Unfortunately Wikipedia is only good to train Base Models, not fine tune dialogue models like OA. For OA you need dialogue data. But you could expand the Tatoeba import script for Farsi relatively easily.
How is it possible to add the support of a new language? The performance of the chat in English is very good, it does not have many languages including my native one i.e. Farsi (Persian). How is it possible to add a language to the system by ourselves? Suppose, in a small scenario, it is possible to collect Persian data and sentence ranking dataset by ourselves