LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.92k stars 3.22k forks source link

Add Wikipedia Persian Dataset #3629

Closed pourmand1376 closed 1 year ago

pourmand1376 commented 1 year ago

Currently, the Open-assistant model doesn't support Farsi. This is a text-only dataset to learn Farsi (Persian).

One of my friends fine-tuned LLaMa on this dataset and It could understand Farsi grammar and word usage very well. If the Open-assistant team wants to add support to Farsi, this should be the first step.

I have transformed the dataset into the standard that has been mentioned here and uploaded it to my huggingface account.

somerandomguyontheweb commented 1 year ago

Hi @pourmand1376, sorry for a slighly off-topic question: could you please share any details on how your friend managed to fine-tune LLaMA on text-only dataset, without instructions? I'm interested in doing the same thing with Belarusian Wikipedia, but so far I've only seen tutorials on how to instruct-tune LLaMA, and Wikipedia articles as such don't contain clearly delimited prompts and responses. Could you please briefly describe the approach?

Thanks in advance for any comments.