SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Closes #570 | Add Dataloader Baybayin #603

Closed akhdanfadh closed 1 month ago

akhdanfadh commented 3 months ago

Closes #570

I implemented one config per language/subset. Thus, configs will look like this: baybayin_baybayin_source, baybayin_latin_seacrowd_imtext, etc. When testing, pass baybayin_<subset> to the --subset_id parameter.

Test result for each subset: baybayin.txt diacritic.txt latin.txt

Checkbox

akhdanfadh commented 2 months ago

@holylovenia Done. Also a friendly reminder for @patrickamadeus to review. Thanks!

akhdanfadh commented 1 month ago

Done handling pillow import @patrickamadeus

patrickamadeus commented 1 month ago

LGTM! Merging in a bit.