CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
552 stars 125 forks source link

Unable to add dependency parses #141

Closed BonJovi1 closed 1 year ago

BonJovi1 commented 2 years ago

Hi Caleb @calebchiam,

I'm trying to perform politeness prediction using the example notebook given here. I run into some errors while adding dependency parses. Currently, I'm doing

from convokit import TextParser
wiki_corpus = Corpus(download("wikipedia-politeness-corpus"))
parser = TextParser(verbosity=1000)

And then when I do

wiki_corpus = parser.transform(wiki_corpus)

It gives me the following error:

StopIteration                             Traceback (most recent call last)
<ipython-input-12-cffb5c2034e3> in <module>
----> 1 wiki_corpus = parser.transform(wiki_corpus)

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textProcessor.py in transform(self, corpus)
     65                 result = self.proc_fn(text_entry)
     66             else:
---> 67                 result = self.proc_fn(text_entry, self.aux_input)
     68             if self.multi_outputs:
     69                 for res, out in zip(result, self.output_field):

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_text_wrapper(self, text, aux_input)
     74 
     75         def _process_text_wrapper(self, text, aux_input={}):
---> 76         return process_text(text, aux_input.get('mode','parse'), 
     77                         aux_input.get('sent_tokenizer',None), aux_input.get('spacy_nlp',None))
     78 

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in process_text(text, mode, sent_tokenizer, spacy_nlp)
    124         offset = 0
    125         for sent in sents:
--> 126                 curr_sent = _process_sentence(sent, mode, offset)
    127                 sentences.append(curr_sent)
    128                 offset += len(curr_sent['toks'])

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_sentence(sent_obj, mode, offset)
     93         tokens = []
     94         for token_obj in sent_obj:
---> 95                 tokens.append(_process_token(token_obj, mode, offset))
     96         if mode == 'parse':
     97                 return {'rt': sent_obj.root.i - offset, 'toks': tokens}

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_token(token_obj, mode, offset)
     86                 token_info['dep'] = token_obj.dep_
     87                 if token_info['dep'] != 'ROOT':
---> 88                         token_info['up'] = next(token_obj.ancestors).i - offset
     89                 token_info['dn'] = [x.i - offset for x in token_obj.children]
     90         return token_info

StopIteration: 

Although, when I do the transform using PolitenessStrategies, that works!!

from convokit import PolitenessStrategies
ps = PolitenessStrategies()
wiki_corpus = ps.transform(wiki_corpus, markers=True)

This works perfectly. Only the TextParser is giving errors. Any idea what the issue might be? Would be grateful if you could kindly have a look!

Thanks a lot, Abhinav

calebchiam commented 2 years ago

Hmm, based on the stack trace, this looks like an error caused by the spacy dependency. @tisjune, if you have the time, can you take a look at this and advise on how we should update the code?

Meanwhile, @BonJovi1, you can resolve this issue locally by uninstalling spacy and reinstalling spacy == 2.3.1. Make sure to re-download en_core_web_sm after this. I've tested this and it resolves the issue.

calebchiam commented 2 years ago

@BonJovi1 Looks like the problem first arises with the Spacy 3.2.0 release, so any release <=3.1.4 will work. Thanks for raising the issue -- we'll release a fix for this soon (or feel free to make a PR yourself).

BonJovi1 commented 2 years ago

Hi Caleb @calebchiam, thanks so much! Installing spacy == 2.3.1 did the trick and I'm now able to add dependency parses! :)

Thanks a bunch, Abhinav

calebchiam commented 2 years ago

Great to hear! We'll keep this issue open until we resolve it properly on our end.

khonzoda commented 2 years ago

Hi! We traced this issue back to some inconsistent behavior of SpaCy's dependency relation parser and raised an issue with them to confirm. We will keep this issue open until the bug is fixed.