facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

Teachers for chinese datasets #2241

Closed MrRace closed 4 years ago

MrRace commented 4 years ago

Hope to add some Chinese datasets ~ It seems only English datasets be supported~ Thanks a lot!

stephenroller commented 4 years ago

I would like that too. There's no reason parlai couldn't support chinese, but our own research currently is english centric. I would very happily accept PRs for chinese datasets, but I don't have any on the roadmap right now. Are there specific ones you'd want?

MrRace commented 4 years ago

Some Chinese Machine reading comprehension datasets like CMRC (Chinese Machine Reading Comprehension 2018 or 2019) and DuReader~

dcsan commented 4 years ago

@MrRace are there any dialog corpus in Chinese you know of?

For spacy chinese model they're working with OntoNotes https://github.com/explosion/spaCy/issues/4695#issuecomment-569731015

stephenroller commented 4 years ago

We just added C3, which is a reading comprehension dataset, which is in Chinese. It's not dialogue, but it's a start! #2665

github-actions[bot] commented 4 years ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.