SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.

Apache License 2.0

57 stars 54 forks source link

Create dataset loader for VinText #223

Closed SamuelCahyawijaya closed 1 month ago

SamuelCahyawijaya commented 6 months ago

Dataloader name: vintext/vintext.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?vintext

Dataset	vintext
Description	Vintext is a challenging scene text dataset for Vietnamese, where some characters are equivocal in the visual form due to accent symbols. This dataset contains 2000 fully annotated images with 56,084 text instances. Each text instance is delineated by a quadrilateral bounding box and associated with the ground truth sequence of characters. The dataset is randomly split into three subsets for training (1,200 images), validation (300 images), and testing (500 images).
Subsets	-
Languages	vie
Tasks	Optical Character Recognition
License	GNU Affero General Public License v3.0 (agpl-3.0)
Homepage	https://github.com/VinAIResearch/dict-guided
HF URL	-
Paper URL	https://ieeexplore.ieee.org/document/9577624

sedrickkeh commented 6 months ago

self-assign

github-actions[bot] commented 6 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

sedrickkeh commented 6 months ago

Working on it. Will try to finish this week

github-actions[bot] commented 5 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

sabilmakbar commented 4 months ago

self-assign

patrickamadeus commented 3 months ago

self-assign