-
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_mm_pmd
| Dataset | id_mm_pmd |
|-------------|---|
| Description | Introduced in the FLAVA paper, Public Mu…
-
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_en_code_mixed
| Dataset | id_en_code_mixed |
|-------------|---|
| Description | This dataset contain 825 t…
-
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_mm_cc_12m
| Dataset | id_mm_cc_12m |
|-------------|---|
| Description | Conceptual 12M (CC12M) is a datase…
-
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_mm_laion
| Dataset | id_mm_laion |
|-------------|---|
| Description | Indo_MultiModal_LAION is a translate…
-
As discussed during our @huggingface/datasets meeting, we are planning to move some "canonical" dataset scripts under their corresponding organization namespace (if this does not exist).
On the con…
-
100K cache entries (1%) have the `ConfigNamesError`. It would be better to show the underlying error, and help the user debug their data files.
-
Python 3.10, NusaCrowd 0.1.2
Error message:
`FileNotFoundError: Couldn't find file at https://raw.githubusercontent.com/IndoNLP/nusa-writes/main/data/nusa_kalimat-mt-bug-train.csv`
-
Hi, I just run the meter eval today and found this issue:
```
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 133, in resolve_trust_remote_code
raise ValueError(
ValueE…
-
While reviewing the dataset, I found several instances of [inappropriate text](https://github.com/search?q=repo%3AIndoNLP%2Fnusa-writes+%22gmn+ngga+coli%22&type=code) with personally identifiable info…
-
https://indonlp.github.io/nusa-catalogue/card.html?id_qqp