SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
57 stars 54 forks source link

Create dataset loader for VinText #223

Closed SamuelCahyawijaya closed 1 month ago

SamuelCahyawijaya commented 6 months ago

Dataloader name: vintext/vintext.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?vintext

Dataset vintext
Description Vintext is a challenging scene text dataset for Vietnamese, where some characters are equivocal in the visual form due to accent symbols. This dataset contains 2000 fully annotated images with 56,084 text instances. Each text instance is delineated by a quadrilateral bounding box and associated with the ground truth sequence of characters. The dataset is randomly split into three subsets for training (1,200 images), validation (300 images), and testing (500 images).
Subsets -
Languages vie
Tasks Optical Character Recognition
License GNU Affero General Public License v3.0 (agpl-3.0)
Homepage https://github.com/VinAIResearch/dict-guided
HF URL -
Paper URL https://ieeexplore.ieee.org/document/9577624
sedrickkeh commented 6 months ago

self-assign

github-actions[bot] commented 6 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

sedrickkeh commented 6 months ago

Working on it. Will try to finish this week

github-actions[bot] commented 5 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

sabilmakbar commented 4 months ago

self-assign

patrickamadeus commented 3 months ago

self-assign