CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
552 stars 125 forks source link

WikiConv Chinese Dataset #146

Open thomaspzollo opened 2 years ago

thomaspzollo commented 2 years ago

I see the original WikiConv paper says there were conversations in Chinese collected, are these available through ConvoKit?

cristiandnm commented 2 years ago

The full Chinese section of the WikiConv corpus is not yet available in ConvoKit.

We have however released a small sample; see section 1.2 of this example notebook: https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/politeness-strategies/Politeness_Strategies_in_MT-mediated_Communication.ipynb

If you need the full corpus and want to add it yourself, that would be of course appreciated; see data contribution guidelines here: https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/CONTRIBUTING.md