SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Closes #6 | Add Loader for XCOPA #286

Closed FawwazMayda closed 8 months ago

FawwazMayda commented 9 months ago

Closes #6 | Extend dataloader for XCOPA to th (Thai) and vi (Vietnamese)

Checkbox

Ind

Source

Screenshot 2024-01-21 at 23 13 07

QA

Screenshot 2024-01-21 at 23 15 12

Th

Source

Screenshot 2024-01-21 at 23 16 40

QA

Screenshot 2024-01-21 at 23 17 59

Vie

Source

Screenshot 2024-01-21 at 23 18 57

QA

Screenshot 2024-01-21 at 23 19 49
sabilmakbar commented 9 months ago

Hi @FawwazMayda, thanks for contributing on this dataloader!

May I know why Indonesian language is being skipped? Both the datacard and the source dataset have id as one of the supported languages, tho.

FawwazMayda commented 9 months ago

Currently it already has id in seacrowd/xcopa so because its already implemented that is why I left with implementing th and vie

Screenshot 2024-01-03 at 21 14 12

Should I rename it to xcopa_id instead of xcopa

sabilmakbar commented 9 months ago

In that case, would you extend the script on sea_datasets/xcopa to cover tha and vie aswell? (Consequently removing xcopa_th and xcopa_vi). This shd be similar to this previously-merged PR, you may refer the implementation to here: https://github.com/SEACrowd/seacrowd-datahub/pull/125

FawwazMayda commented 8 months ago

updated @sabilmakbar I added all those languages into single xcopa.py

FawwazMayda commented 8 months ago

Adjusted based on your comments @holylovenia

sabilmakbar commented 8 months ago

Hi @FawwazMayda, can you try to run the formatter check and fix it accordingly? I found some import & readability suggestions found like traling whitespace in a line, unused imports, fix reference of internal var in config creation (l to lang)

make check_file=seacrowd/sea_datasets/xcopa/xcopa.py
FawwazMayda commented 8 months ago

updated @sabilmakbar