Closed akhdanfadh closed 2 months ago
@holylovenia Resolve some changes and added additional package handling.
This task has to be added in a separate PR.
I'll make the new PR once everything on the dataloader part is done and reviewed.
IDK why constants.py
is still listed as a changed file here even tho I already changed it by running git checkout upstream/master -- seacrowd/utils/constants.py
. I think I will delete the file manually then.
I think I also need to make a new PR for adding data/
in .gitignore
, right? @holylovenia
I think I also need to make a new PR for adding
data/
in.gitignore
, right? @holylovenia
Yes yes, I'll approve that PR once you create it.
Yes yes, I'll approve that PR once you create it.
A friendly reminder for @akhdanfadh to address @sabilmakbar's suggestions. 🙏
@sabilmakbar Done, please check my replies.
@sabilmakbar Done with the inline comments.
lgtm, thanks @akhdanfadh!
Closes #206
Sorry if this PR is made before any final decision in #206, I just want to rush things.
Some possible discussions:
pip install gdown
. I am aware that I should made a separate PR for this, just want to put this here first for reviewers to discuss the implementation. Related discussion is on #206.data/sleukrith_ocr/
, thus the updated.gitignore
. Again, should be on a separate PR, right?_generate_examples()
). What do you think about that? Should I just store the image as numpy arrays instead?Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.