SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for BEST2019-Handwritten #664

Open SamuelCahyawijaya opened 1 month ago

SamuelCahyawijaya commented 1 month ago

Dataloader name: best2019/best2019.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?best2019

Dataset best2019
Description This dataset contains training examples of typed and handwritten thai alphabets and phrases used in BEST 2019 competition in Thailand for optical character recongition. In the Dataset URL, there are 3 groups of folders. First, 68PersonsBmp folder contains 68 scanned documents of typed and handwritten numbers and thai alphabets. Second, ST200- folders contained typed and handwritten thai alphabets and phrases where each subfolder has different thai phrases. WD200- contains a mixture of typed and handwritten thai alphabets, monetary quantities (e.g. twenty baht) and provinces in Thailand where each subfolder contains different monetary quantities and provinces.
Subsets 68PersonsBmp, ST200-1, ST200-2, ST200-3, WD200-1, WD200-2, WD200-3, WD200-4
Languages tha
Tasks Optical Character Recognition
License Unknown (unknown)
Homepage https://thailang.nectec.or.th/best/best2019-handwrittenrecognition-trainingset/
HF URL -
Paper URL -