CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
542 stars 121 forks source link

Input format in short tweet for politeness inference #62

Closed binghe2727 closed 2 years ago

binghe2727 commented 4 years ago

Hi all,

I have found the related issues about my problem: Given a short text/tweet, predict the politeness score/category. https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/issues/44

But, after reading the previous answer, I still could not figure out how to design a pipeline to do the inference/prediction. To my understanding, I need to first transfer the tweets to the corpus? Am I right? Can anyone give a more detailed explanation for how I can predict the politeness score/category of the tweet (In your website, we can do this http://politeness.cornell.edu/ .) Many thanks.

binghe2727 commented 4 years ago

The detailed suggestion for each step in the whole pipeline is highly appreciated.

liye commented 4 years ago

Hi @binghesam, your understanding is correct: at the moment you will need to first construct a corpus from the tweets to extract politeness features. An example conversion that may be the closest to your case can be found here, which converts a set of texts (together with the annotated politeness scores) into the desired format.

Once you have formatted your data into a corpus (say, tweets-corpus), you could then directly follow the demo mentioned in #44 from "2. Annotate the corpus with politeness strategies" onwards, replacing _wikicorpus to tweets-corpus.

PS: We do plan to support direct extraction of politeness strategies features from raw texts in a future release.

liye commented 3 years ago

Hi @binghesam, as you suggested, we have added the functionality to extract politeness strategies directly with string inputs in the most recent release (v2.4.3).

Below shows an example of how this can be done:

import spacy
from convokit import PolitenessStrategies

ps = PolitenessStrategies()
spacy_nlp = spacy.load('en', disable=['ner'])

utt = ps.transform_utterance("hello, could you please help me proofread this article?", spacy_nlp=spacy_nlp)

You should then be able to see politeness strategy information in utt.meta['politeness_strategies'].

I hope this helps. If you have the chance to play around with this version of politeness strategy extraction, we would love to hear what you think!

cristiandnm commented 2 years ago

Closing this issue as it has been resolved. @binghe2727 , if you have feedback please let us know.