Open ceferisbarov opened 9 months ago
Azerbaijani lacks a high-quality instruction dataset to fine-tune an LLM. There have been several attempts to create such dataset by translation.
Both of the mentioned examples suffer from poor text quality. Their contents are also irrelevant to world knowledge of an Azerbaijani speaker.
The following dataset contains an Azerbaijani subset:
https://huggingface.co/datasets/akoksal/muri-it
Azerbaijani lacks a high-quality instruction dataset to fine-tune an LLM. There have been several attempts to create such dataset by translation.
Both of the mentioned examples suffer from poor text quality. Their contents are also irrelevant to world knowledge of an Azerbaijani speaker.