SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for multispider #610

Closed SamuelCahyawijaya closed 2 months ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: multispider/multispider.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?multispider

Dataset multispider
Description MULTISPIDER, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MULTISPIDER, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages
Subsets multispider_vi
Languages vie
Tasks Text-to-SQL
License Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage https://github.com/longxudou/multispider
HF URL https://huggingface.co/datasets/dreamerdeo/multispider
Paper URL -
muhammadravi251001 commented 2 months ago

self-assign