Add Wikipedia Persian Dataset

LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

https://open-assistant.io

Apache License 2.0

37.1k stars 3.24k forks source link

Add Wikipedia Persian Dataset #3629

Closed pourmand1376 closed 1 year ago

pourmand1376 commented 1 year ago

Currently, the Open-assistant model doesn't support Farsi. This is a text-only dataset to learn Farsi (Persian).

One of my friends fine-tuned LLaMa on this dataset and It could understand Farsi grammar and word usage very well. If the Open-assistant team wants to add support to Farsi, this should be the first step.

I have transformed the dataset into the standard that has been mentioned here and uploaded it to my huggingface account.

2974

somerandomguyontheweb commented 1 year ago

Hi @pourmand1376, sorry for a slighly off-topic question: could you please share any details on how your friend managed to fine-tune LLaMA on text-only dataset, without instructions? I'm interested in doing the same thing with Belarusian Wikipedia, but so far I've only seen tutorials on how to instruct-tune LLaMA, and Wikipedia articles as such don't contain clearly delimited prompts and responses. Could you please briefly describe the approach?

Thanks in advance for any comments.

LAION-AI / Open-Assistant

Add Wikipedia Persian Dataset #3629

2974