SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

Closes #344 | Create dataset loader for VLSP2016-SA #500

Closed djanibekov closed 5 months ago

djanibekov commented 5 months ago

Closes #344

Checkbox

image

holylovenia commented 5 months ago

@djanibekov thank you for your work. Checked, LGTM! The only comment: not sure if we need a tokenized version of schema vlsp2016_sa_tokenized_seacrowd_text, for me they are identical. I understand the reason for adding this schema since there are two files: tokenized and not tokenized. But in the end, they look similar. We can delete this tokenized schema or leave it as it is. What do you think @holylovenia?

Let's have both for the sake of completion and clarity.