SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Closes #355 | Add Dataloader TotalDefMeme #602

Closed akhdanfadh closed 5 months ago

akhdanfadh commented 6 months ago

Closes #355

There are 2 tasks with different schemas. The OCR task is intended for all the images, but the ImageClassification task is only for those having pillar_stances attribute since the dataset is about pillar classification, CMIIW.

Also, the new image schema is added here instead of in a new PR for example sake. Once checked and okay, I will add a new PR for adding the new schema, and remove the relevant files from this PR.

Also again, similar to #556 and #566: I use third-party libraries to download the GDrive data, i.e., pip install gdown, because it is more reliable than the dl_manager. Similarly, I also store the downloaded data in data/total_defense_meme/. I am aware that I should make or wait for a PR on those two things, so currently waiting for further instruction.

Checkbox

akhdanfadh commented 5 months ago

@holylovenia I've done changing what can be done. See my comments on your review.

Let's move the new tasks and schema to another PR.

I'll make the new PR once everything on the dataloader part is done and reviewed.