SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Closes #225 | Add Dataloader ALICE-THI #597

Closed akhdanfadh closed 5 months ago

akhdanfadh commented 6 months ago

Closes #225

I implemented one config per language/subset. Thus, configs will look like this: alice_thi_THI-C68_source, alice_thi_THI-D10_seacrowd_imtext, etc. When testing, pass alice_thi_<subset> to the --subset_id parameter.

Checkbox

yongzx commented 5 months ago

I ran into this issue when I run python -m tests.test_seacrowd seacrowd/sea_datasets/alice_thi/alice_thi.py --subset alice_thi_THI-D10. also the same when I ran with --schema IMTEXT

Traceback (most recent call last):
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/builder.py", line 1687, in _prepare_split_single
    example = self.info.features.encode_example(record) if self.info.features is not None else record
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1866, in encode_example
    return encode_nested_example(self, example)
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1243, in encode_nested_example
    {
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1244, in <dictcomp>
    k: encode_nested_example(sub_schema, sub_obj, level=level + 1)
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1243, in encode_nested_example
    {
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1243, in <dictcomp>
    {
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 323, in zip_dict
    yield key, tuple(d[key] for d in dicts)
  File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 323, in <genexpr>
    yield key, tuple(d[key] for d in dicts)
KeyError: 'context'
ljvmiranda921 commented 5 months ago

Will review this after you address @yongzx 's comments! Got a bit busy this week 😅

akhdanfadh commented 5 months ago

I don't know why the comment resulted in an error on your end, but not on mine. I've uncommented the line there. Also, ran the makefile.

@yongzx @ljvmiranda921

yongzx commented 5 months ago

It runs for me now! LGTM