SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Create dataset loader for Baybayin #570

Closed SamuelCahyawijaya closed 4 months ago

SamuelCahyawijaya commented 6 months ago

Dataloader name: baybayin/baybayin.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?baybayin

Dataset baybayin
Description The Baybayin dataset contains binary images of Baybayin characters, Latin characters, and 4 character symbols of Baybayin diacritics in MATLAB format. It consisted of 17000 images for Baybayin (1000 per character), 18200 images for Latin (700 per character), and 2000 images for Baybayin diacritics (500 per symbol). Each character image is strictly center-fitted with a size 56x56 pixels. This dataset was initially used to discriminate Latin script from Baybayin script in character recognition.
Subsets Baybayin characters, Latin characters, Baybayin diacritic images
Languages tgl
Tasks Optical Character Recognition
License Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage https://www.kaggle.com/datasets/rodneypino/baybayin-and-latin-binary-images-in-mat-format
HF URL -
Paper URL https://peerj.com/articles/cs-360/
akhdanfadh commented 6 months ago

self-assign