SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Extend dataset loader for Universal Dependencies #4

Closed SamuelCahyawijaya closed 4 months ago

SamuelCahyawijaya commented 11 months ago

Dataloader name: ud/ud.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?ud

Dataset ud
Description Universal Dependencies (UD) is a project that is developing cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on an evolution of (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). The general philosophy is to provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary.
Subsets id_gsd, id_csui, id_pud, vi_vtb, tl_trg, tl_ugnayan
Languages ind, vie, tgl
Tasks POS Tagging
License Apache license 2.0 (apache-2.0)
Homepage http://hdl.handle.net/11234/1-5150
HF URL https://huggingface.co/datasets/universal_dependencies/
Paper URL https://aclanthology.org/P13-2017/
ijindal commented 11 months ago

self-assign

sabilmakbar commented 10 months ago

Hi @ijindal , may I know the current progress of this dataloader? Feel free to discuss in here if you have any difficulties, thx!

ijindal commented 10 months ago

Thanks for the reminder @sabilmakbar. this task was somehow dropped from my todo list. I have started working on it.

sabilmakbar commented 10 months ago

Thanks for letting us know, @ijindal. Btw, do you have any initial ETA for the completion? So we can be aware and mark it as in progress

github-actions[bot] commented 9 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.