CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
542 stars 121 forks source link

Annotate utterances.jsonl #148

Closed vr25 closed 2 years ago

vr25 commented 2 years ago

Hi,

I am planning to annotate the text in utterances.jsonl file with custom entities using the external UBI AI annotation tool. This will change the original structure of the convokit dataset.

So, how do I annotate it and also retain the dataset structure?

Any help would be appreciated.

Thanks!

calebchiam commented 2 years ago

Hi, have you tried loading it as a corpus and adding the custom entities as utterance metadata? You can dump (aka save) the corpus once you're done.

vr25 commented 2 years ago

Thanks @calebchiam for your prompt response. No, I haven't tried that. So, I will look into it and ask further questions.