ExpressAI / DataLab

The unified platform for data-related resources.
https://expressai.github.io/DataLab/
Apache License 2.0
131 stars 27 forks source link

Add nusax-mt dataset #380

Closed SamuelCahyawijaya closed 2 years ago

SamuelCahyawijaya commented 2 years ago

Adding NusaX MT dataset.

The dataset is taken from https://github.com/IndoNLP/nusax/tree/main/datasets/mt

pfliu-nlp commented 2 years ago

Hi, @SamuelCahyawijaya here is a trick just in case you don't realize it, regarding how to add new dataset information into dataset_info.jsonl

https://github.com/ExpressAI/DataLab/blob/main/docs/SDK/add_new_datasets_into_sdk.md#5-make-your-datasets-registered

SamuelCahyawijaya commented 2 years ago

Hi @pfliu-nlp, thank you for noticing. Yeah, I missed this one before. I added the lines on the dataset_info.jsonl using another script. Do I need to make any adjustments on the dataset_info.jsonl for this dataset?

pfliu-nlp commented 2 years ago

Hi @SamuelCahyawijaya it would be nice if you could re-generate the dataset_info.jsonl and make a new PR, since the previous one you made unfortunately involves some format issue, for example: at line 1048

image

(each line should be a {...})