SamuelCahyawijaya commented 10 months ago

Dataset	udhr
Description	he Universal Declaration of Human Rights (UDHR) is a milestone document in the history of human rights. Drafted by representatives with different legal and cultural backgrounds from all regions of the world, it set out, for the first time, fundamental human rights to be universally protected. The Declaration was adopted by the UN General Assembly in Paris on 10 December 1948 during its 183rd plenary meeting.
Subsets	ind, ilo, mnw, tet, pam, lus, vie, min, lao, tgl, hni, ceb, jav, shn, bcl, hil, sun, ban, zlm, cnh, kkh, cfm, ctd, duu, tdt, tha, bug, mad, mya, khm, war, ace, hnj, blt, hlt
Languages	ind, ilo, mnw, tet, pam, lus, vie, min, lao, tgl, hni, ceb, jav, shn, bcl, hil, sun, ban, zlm, cnh, kkh, cfm, ctd, duu, tdt, tha, bug, mad, mya, khm, war, ace, hnj, blt, hlt
Tasks	Language Modeling
License	Unknown (unknown)
Homepage	https://huggingface.co/datasets/udhr?row=1
HF URL	https://huggingface.co/datasets/udhr?row=1
Paper URL	https://unicode.org/udhr/translations.html

SamuelCahyawijaya commented 10 months ago

For this dataset, please make the dataset into multiple subsets, one for each language, with a single document on each subset.

IvanHalimP commented 10 months ago

self-assign

IvanHalimP commented 10 months ago

Hi, I'd like to report that the following language codes :

"abs", "cja", "fil", "iba", "dbj"

are nowhere to be found in the dataset.

SEACrowd / seacrowd-datahub