SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
68 stars 57 forks source link

Create dataset loader for ALICE-THI #225

Closed SamuelCahyawijaya closed 6 months ago

SamuelCahyawijaya commented 10 months ago

Dataloader name: alice-thi/alice-thi.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?alice-thi

Dataset alice-thi
Description ALICE-THI is a Thai handwritten script dataset that contains 24045 character images, which is split into Thai handwritten character dataset (THI-C68) for 14490 images and Thai handwritten digit dataset (THI-D10) for 9555 images. The data was collected from 150 native writers aged from 20 to 23 years old. The participants were allowed to write only the isolated Thai script on the form and at least 100 samples per character. The character images obtained from this dataset generally have no background noise.
Subsets THI-C68, THI-D10
Languages tha
Tasks Optical Character Recognition
License Unknown (unknown)
Homepage https://www.ai.rug.nl/~mrolarik/ALICE-THI/
HF URL -
Paper URL https://www.sciencedirect.com/science/article/abs/pii/S0952197615001724
sedrickkeh commented 10 months ago

self-assign

github-actions[bot] commented 10 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

sedrickkeh commented 10 months ago

Working on it. Will try to finish this week

holylovenia commented 10 months ago

Working on it. Will try to finish this week

No problem. Feel free to let us know if you have any question.

github-actions[bot] commented 9 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

sabilmakbar commented 7 months ago

self-assign

akhdanfadh commented 7 months ago

self-assign