allmalab / problems

Challenges to solve in Azerbaijani NLP
7 stars 0 forks source link

Instruction dataset #2

Open ceferisbarov opened 9 months ago

ceferisbarov commented 9 months ago

Azerbaijani lacks a high-quality instruction dataset to fine-tune an LLM. There have been several attempts to create such dataset by translation.

Both of the mentioned examples suffer from poor text quality. Their contents are also irrelevant to world knowledge of an Azerbaijani speaker.

ceferisbarov commented 2 months ago

The following dataset contains an Azerbaijani subset:

https://huggingface.co/datasets/akoksal/muri-it