SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

add thaigov #412

Closed TysonYu closed 7 months ago

TysonYu commented 8 months ago

Closes #357.

Checkbox

sabilmakbar commented 8 months ago

Hi @TysonYu, a suggestion to change the init PR message of Closes #{ISSUE_NUMBER} so that it will be linked to the dataloader issue for coming PRs (I've done it on this one, tho).

TysonYu commented 8 months ago

Hi @TysonYu, a suggestion to change the init PR message of Closes #{ISSUE_NUMBER} so that it will be linked to the dataloader issue for coming PRs (I've done it on this one, tho).

Okay, will do it for later ones.

TysonYu commented 8 months ago

rather than having to write on _split_generators and re-read again in _generate_examples, why we don't pass the all_data list in _split_generators gen_kwargs and use it directly on generate_examples? I think passing such is possible (see this SEACrowd Implementation)

Hey, I do by this way because it seems to be logically correct and clear. I agree your mentioned approach is another implementation and still my current approach should be fine. I think some other dataloaders also did in this way, such as indosum.