CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
553 stars 126 forks source link

Convokit for Facebook Messenger chat logs conversation analysis #92

Closed annaksig closed 3 years ago

annaksig commented 3 years ago

Hello! I am a PhD researcher on conversational AI . I would like to cary out automatic conversation analysis (CA) of Facebook messenger chatlogs (conversations between a person and a chatbot). In our opinion, will Convokit be a suitable tool for that? Facebook messenger chatlogs can be extracted in both json and html formats. Would json be more suitable? My main worry is if the software is suitable for Q-A type of conversations (like between chatbot and a human). Also how much advanced programming skills are required?

KR Anna

calebchiam commented 3 years ago

Yes, that seems like a reasonable use case, though note that ConvoKit can be quite memory-intensive as its class objects are loaded into memory. (How many conversations / utterances do you expect to be working with?)

As for data format, you would want to extract it in JSON form, load them as Python dictionaries, and then construct ConvoKit objects from them: see this for an example.

Q-A pair conversations are fine -- our Parliamentary Questions dataset may be a useful reference.

ConvoKit is designed with ease of use in mind, so basic/intermediate Python skills are sufficient to make use of the package. You can refer to our docs or post an issue here if you have any questions and we'll be happy to help.

annaksig commented 3 years ago

Also I am interested in the content if the conversations and lexical choices, length of utterances etc .

KR

Anna

Anna Xygkou, BA in Linguistics MSc in Language and Communication Impairments, University of Sheffield MSc in Research Methods University of Kent PhD Researcher in Conversational AI, Virtual Reality and Autism University of Kent

Skype: annats8003 Twitter: @anna_ksigou

On 11 Mar 2021, at 01:09, calebchiam notifications@github.com wrote:



Yes, that seems like a reasonable use case, though note that ConvoKit can be quite memory-intensive as its classes are loaded into memory. (How many conversations / utterances do you expect to be working with?)

As for data format, you would want to extract it in JSON form, load them as Python dictionaries, and then construct ConvoKit objects from them: see thishttps://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/converting_movie_corpus.ipynb for an example.

Q-A pair conversations are fine -- our Parliamentary Questions datasethttps://convokit.cornell.edu/documentation/parliament.html may be a useful reference.

ConvoKit is designed with ease of use in mind, so basic/intermediate Python skills are sufficient to make use of the package. You can refer to our docshttps://convokit.cornell.edu/documentation/index.html or post an issue here if you have any questions and we'll be happy to help.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/issues/92#issuecomment-796343471, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKVVOZZ7GPLD3FMAX6OEKBDTDAC4NANCNFSM4Y6MNOLQ.

annaksig commented 3 years ago

Thank you for your reply . The amount of conversations will be 50 people x 30 days x 15 minutes per day . I cannot be specific on that . Could you also please clarify the ‘memory intensive’ issue ? Do you suggest using context-specific coding (aka parliamentary questions dataset) and add up to that or create new coding from scratch ? Will you or anyone be able to help in this project?

KR

Anna

Anna Xygkou, BA in Linguistics MSc in Language and Communication Impairments, University of Sheffield MSc in Research Methods University of Kent PhD Researcher in Conversational AI, Virtual Reality and Autism University of Kent

Skype: annats8003 Twitter: @anna_ksigou

On 11 Mar 2021, at 01:09, calebchiam notifications@github.com wrote:



Yes, that seems like a reasonable use case, though note that ConvoKit can be quite memory-intensive as its classes are loaded into memory. (How many conversations / utterances do you expect to be working with?)

As for data format, you would want to extract it in JSON form, load them as Python dictionaries, and then construct ConvoKit objects from them: see thishttps://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/converting_movie_corpus.ipynb for an example.

Q-A pair conversations are fine -- our Parliamentary Questions datasethttps://convokit.cornell.edu/documentation/parliament.html may be a useful reference.

ConvoKit is designed with ease of use in mind, so basic/intermediate Python skills are sufficient to make use of the package. You can refer to our docshttps://convokit.cornell.edu/documentation/index.html or post an issue here if you have any questions and we'll be happy to help.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/issues/92#issuecomment-796343471, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKVVOZZ7GPLD3FMAX6OEKBDTDAC4NANCNFSM4Y6MNOLQ.

calebchiam commented 3 years ago

That seems manageable. The issue with memory is that it can be difficult to load large corpora depending on the amount of RAM you have on your machine. This is typically only an issue when you are working on the order of 100K or 1000K conversations. For your use case (1500 conversations essentially), this is considered a small dataset and there should be no memory issues.

I'm not sure what you mean by context-specific coding. We won't be able to help you directly with your project, but we're happy to provide pointers if you encounter issues using ConvoKit. I would recommend just installing the package and going through the tutorial to see if it is appropriate for your use case.

annaksig commented 3 years ago

Thank you for all info ! I mean there is specific coding for example for the parliament talks ...Cam I choose some of the coding to extract the desired features and add coding for different ones ?

KR

Anna

Anna Xygkou, BA in Linguistics MSc in Language and Communication Impairments, University of Sheffield MSc in Research Methods University of Kent PhD Researcher in Conversational AI, Virtual Reality and Autism University of Kent

Skype: annats8003 Twitter: @anna_ksigou

On 11 Mar 2021, at 06:20, calebchiam @.***> wrote:



That seems manageable. The issue with memory is that it can be difficult to load large corpora depending on the amount of RAM you have on your machine. This is typically only an issue when you are working on the order of 100K or 1000K conversations. For your use case (1500 conversations essentially), this is considered a small dataset and there should be no memory issues.

I'm not sure what you mean by context-specific coding. We won't be able to help you directly with your project, but we're happy to provide pointers if you encounter issues using ConvoKit. I would recommend just installing the package and going through the tutorial to see if it is appropriate for your use case.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/issues/92#issuecomment-796491308, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKVVOZ5QKU2QO7O3V2MPL6LTDBHKTANCNFSM4Y6MNOLQ.

calebchiam commented 3 years ago

Yes, you can customize the code to extract and store whatever data you'd like.

annaksig commented 3 years ago

Thank you so much for prompt replies and info. Your support is much appreciated. I will give it a go!

KR

Anna Xygkou, BA in Linguistics MSc in Language and Communication Impairments University of Sheffield MSc in Research Methods University of Kent PhD Researcher in Conversational AI, Virtual Reality and Autism University of Kent

Skype: annats8003 Twitter: @anna_ksigou

From: calebchiam @.> Date: Thursday, 11 March 2021 at 10:24 To: CornellNLP/Cornell-Conversational-Analysis-Toolkit @.> Cc: Anna Xygkou @.>, Author @.> Subject: Re: [CornellNLP/Cornell-Conversational-Analysis-Toolkit] Convokit for Facebook Messenger chat logs conversation analysis (#92)

Yes, you can customize the code to extract and whatever data you'd like.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/issues/92#issuecomment-796631865, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKVVOZYMDM3NQURJRE4OKTDTDCD2VANCNFSM4Y6MNOLQ.

annaksig commented 3 years ago

Good morning Caleb, Hope you are well. I am sorry to bother you again, but I am in a difficult situation. We had talked again back in March in github re using ConvoKit for my study. I found a computer science MSc student who would help me out with ConvoKit analysis, but now he is not up to it. I have all my data ready (conversations of 33 people (between 2) on Facebook messenger in json format-small sample), and I am looking either for a collaborator in my study (paid job) or someone to be second-author. I am sorry but I would appreciate your guidance. It seems that it is very difficult to find someone who has used Convokit, and I have already advertised.

Looking forward to your support.

KR

Anna

Anna Xygkou, BA in Linguistics MSc in Language and Communication Impairments University of Sheffield MSc in Research Methods University of Kent PhD Researcher in Conversational AI, Virtual Reality and Autism University of Kent

Skype: annats8003 Twitter: @anna_ksigou

From: calebchiam @.> Date: Thursday, 11 March 2021 at 10:24 To: CornellNLP/Cornell-Conversational-Analysis-Toolkit @.> Cc: Anna Xygkou @.>, Author @.> Subject: Re: [CornellNLP/Cornell-Conversational-Analysis-Toolkit] Convokit for Facebook Messenger chat logs conversation analysis (#92)

Yes, you can customize the code to extract and whatever data you'd like.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/issues/92#issuecomment-796631865, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKVVOZYMDM3NQURJRE4OKTDTDCD2VANCNFSM4Y6MNOLQ.