CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
556 stars 129 forks source link

Deli branch #239

Closed yash-chatha closed 3 weeks ago

yash-chatha commented 1 month ago

Description

This PR adds support for the DeliData corpus within ConvoKit. The changes include:

A new .rst file for documentation on the DeliData dataset. A Jupyter Notebook, ConvoKit_DeliData_Conversion.ipynb, to convert the DeliData corpus into ConvoKit format. Necessary modifications to integrate DeliData into ConvoKit, providing detailed metadata at speaker, utterance, and conversation levels.

Motivation and Context

The DeliData dataset is designed for analyzing deliberation in multi-party problem-solving contexts, capturing intricate details about participant interactions, message annotations, and team performance. This addition will broaden ConvoKit's dataset resources, especially for users interested in studying deliberative dialogue. Reference to original publication: Karadzhov, G., Stafford, T., & Vlachos, A. (2023). DeliData: A dataset for deliberation in multi-party problem solving. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2), 1-25.

How has this been tested?

The changes have been tested in a local development environment:

Verified that the dataset conversion process runs without issues using the provided notebook. Confirmed that DeliData integration does not affect existing functionalities and works seamlessly within ConvoKit.

Other information

For full usage details, refer to the new deli.rst documentation. The ConvoKit-compatible DeliData corpus is available upon request and can be converted using the notebook linked in the documentation. The dataset comprises 500 conversations, 30 speakers, and 17,111 utterances, providing a rich resource for analyzing team performance and deliberation types.

seanzhangkx8 commented 3 weeks ago

let me close this one. this is actually not what i meant by creating a different branch.