Closed akhdanfadh closed 5 months ago
I ran into this issue when I run python -m tests.test_seacrowd seacrowd/sea_datasets/alice_thi/alice_thi.py --subset alice_thi_THI-D10
. also the same when I ran with --schema IMTEXT
Traceback (most recent call last):
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/builder.py", line 1687, in _prepare_split_single
example = self.info.features.encode_example(record) if self.info.features is not None else record
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1866, in encode_example
return encode_nested_example(self, example)
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1243, in encode_nested_example
{
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1244, in <dictcomp>
k: encode_nested_example(sub_schema, sub_obj, level=level + 1)
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1243, in encode_nested_example
{
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/features/features.py", line 1243, in <dictcomp>
{
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 323, in zip_dict
yield key, tuple(d[key] for d in dicts)
File "/Users/yong/Dev/env_seacrowd/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 323, in <genexpr>
yield key, tuple(d[key] for d in dicts)
KeyError: 'context'
Will review this after you address @yongzx 's comments! Got a bit busy this week 😅
I don't know why the comment resulted in an error on your end, but not on mine. I've uncommented the line there. Also, ran the makefile.
@yongzx @ljvmiranda921
It runs for me now! LGTM
Closes #225
I implemented one config per language/subset. Thus, configs will look like this:
alice_thi_THI-C68_source
,alice_thi_THI-D10_seacrowd_imtext
, etc. When testing, passalice_thi_<subset>
to the--subset_id
parameter.Checkbox
seacrowd/sea_datasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
.