SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
66 stars 58 forks source link

Create dataset loader for Vi Pubmed #226

Closed SamuelCahyawijaya closed 8 months ago

SamuelCahyawijaya commented 10 months ago

Dataloader name: vi_pubmed/vi_pubmed.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?vi_pubmed

Dataset vi_pubmed
Description Vi Pubmed (or Vietnamese Pubmed) is a corpus of PubMed biomedical abstracts translated by the state-of-the-art English-Vietnamese Translation project. The data has been used as unlabeled dataset for pretraining a Vietnamese Biomedical-domain Transformer model.
Subsets -
Languages vie
Tasks Machine Translation
License Other (other)
Homepage https://huggingface.co/datasets/VietAI/vi_pubmed
HF URL https://huggingface.co/datasets/VietAI/vi_pubmed
Paper URL https://aclanthology.org/2023.eacl-main.228/
ljvmiranda921 commented 10 months ago

self-assign

dovanquyet commented 9 months ago

self-assign

ljvmiranda921 commented 9 months ago

Feel free to get this @dovanquyet :)

dovanquyet commented 9 months ago

sure, thanks

Enliven26 commented 9 months ago

self-assign