Closed johannesburg closed 2 years ago
Hey there! To help us debug this issue, it could be helpful if you can provide us with two additional pieces of information:
@johannesburg Just to give you an update and a request for even more additional information (on top of what was mentioned in the previous comment), I have thus far been unable to reproduce the error using subreddit-NewOrleans
on Python 3.10.6. The one remaining possibility I would like to explore is that the error could be triggered by something strange about the secondary corpus that you are merging with subreddit-NewOrleans
. Could we get some details about that other corpus? I notice in the traceback the name of the other corpus is raw_corpus
, so could you provide an explanation (ideally with sample code if it is short enough) of how you created that other corpus?
Hey Jonathan! I've just added you to our git repo with the code. The merge is in the file scripts/collate_data.py
, we're merging it with subreddit-houston
and subreddit-texas
. I'll reach out to my collaborators about what versions they're using on their machines and report back soon.
From my project partners:
Parter A:
Ubuntu 22.04.1 Python 3.10.6 convokit 2.5.3
My other computer has the same OS / python but a dev version of convokit; I think that was working also: convokit @ git+https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit.git@fcb405c0c3c38173a960189bccf7cf4ea93a765e
Partner L: Monterey 12.4 Python 3.10.7 ConvoKit: 2.5.3
@johannesburg Alright, I finally managed to get the issue fully tracked down. I've pushed the fixes and verified that collate_data.py
(the previously crashing script) now runs to completion. You should uninstall convokit and reinstall from git to get the fixes.
(You should ensure that you pull the latest commits from GitHub when installing; the command to use is pip3 install git+https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit.git
)
Brilliant! Thank you so much!
Screenshot
Here's the relevant stack trace:
Steps to reproduce
I ran this with the latest version of convokit, and was instantiating the corpus without any special settings (e.g. mem mode, by default). It was using the
subreddit-NewOrleans
corpus. The code runs on my project partners' setup, just not on mine.Additional information