CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
552 stars 125 forks source link

Adding Open question transformer #204

Open davesavedaday opened 11 months ago

davesavedaday commented 11 months ago

Description

Feature: This PR introduces a new transformer that scores utterances with questions (0 otherwise).

The BERT similarity method computes the degree of openness of a given question by analyzing how "diverse" the reply can be, given the question. It uses BM25 method to look for 10 different questions that are the most similar to the question at hand. Then, it will compute the cosine similarity of the responses that these 10 questions received. It hinges upon the idea that the more open a question is, there is a multiple ways to answer the question; the more closed a question is, there is a fixed way that a respondent can reply -- A high similarity score will indicate closedness of the question. The PR introduces the new transformer that accomplishes this task.

Motivation and Context

This new transformer aims to capture and score the openness of questions.

How has this been tested?

It has been tested locally through the demo.

Other information

This was made in part of the requirement of A8 of INFO 4350.