CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
556 stars 129 forks source link

"This StorageManager does not have an entry for the meta with id..." #186

Closed johannesburg closed 2 years ago

johannesburg commented 2 years ago

Screenshot

Here's the relevant stack trace:

Traceback (most recent call last):
  File "/Users/johan/cornell/nlp-class/info-6742-final-project/scripts/collate_data.py", line 151, in <module>
    create_corpus(events)
  File "/Users/johan/cornell/nlp-class/info-6742-final-project/scripts/collate_data.py", line 108, in create_corpus
    corpus = Corpus.merge(corpus, raw_corpus)
  File "/opt/anaconda3/envs/i6742/lib/python3.10/site-packages/convokit/model/corpus.py", line 983, in merge
    new_corpus.meta.reinitialize_from(primary.meta)
  File "/opt/anaconda3/envs/i6742/lib/python3.10/site-packages/convokit/model/convoKitMeta.py", line 132, in reinitialize_from
    other = {k: v for k, v in other.to_dict().items()}
  File "/opt/anaconda3/envs/i6742/lib/python3.10/site-packages/convokit/model/convoKitMeta.py", line 122, in to_dict
    self._get_storage().get_data(
  File "/opt/anaconda3/envs/i6742/lib/python3.10/site-packages/convokit/model/storageManager.py", line 172, in get_data
    raise KeyError(
KeyError: 'This StorageManager does not have an entry for the meta with id corpus_subreddit-NewOrleans.'
make: *** [data/filtered_corpus/utterances.jsonl] Error 1

Steps to reproduce

I ran this with the latest version of convokit, and was instantiating the corpus without any special settings (e.g. mem mode, by default). It was using the subreddit-NewOrleans corpus. The code runs on my project partners' setup, just not on mine.

Additional information

 * your ConvoKit version - 2.5.3
 * your operating system details - macOS Monterey 12.6
 * Python version - 3.10.6
 * type of python installation (system-provided, downloaded from Python.org, or Anaconda) - via anaconda 
jpwchang commented 2 years ago

Hey there! To help us debug this issue, it could be helpful if you can provide us with two additional pieces of information:

jpwchang commented 2 years ago

@johannesburg Just to give you an update and a request for even more additional information (on top of what was mentioned in the previous comment), I have thus far been unable to reproduce the error using subreddit-NewOrleans on Python 3.10.6. The one remaining possibility I would like to explore is that the error could be triggered by something strange about the secondary corpus that you are merging with subreddit-NewOrleans. Could we get some details about that other corpus? I notice in the traceback the name of the other corpus is raw_corpus, so could you provide an explanation (ideally with sample code if it is short enough) of how you created that other corpus?

johannesburg commented 2 years ago

Hey Jonathan! I've just added you to our git repo with the code. The merge is in the file scripts/collate_data.py, we're merging it with subreddit-houston and subreddit-texas. I'll reach out to my collaborators about what versions they're using on their machines and report back soon.

johannesburg commented 2 years ago

From my project partners:

Parter A:

Ubuntu 22.04.1 Python 3.10.6 convokit 2.5.3

My other computer has the same OS / python but a dev version of convokit; I think that was working also: convokit @ git+https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit.git@fcb405c0c3c38173a960189bccf7cf4ea93a765e

Partner L: Monterey 12.4 Python 3.10.7 ConvoKit: 2.5.3

jpwchang commented 2 years ago

@johannesburg Alright, I finally managed to get the issue fully tracked down. I've pushed the fixes and verified that collate_data.py (the previously crashing script) now runs to completion. You should uninstall convokit and reinstall from git to get the fixes.

(You should ensure that you pull the latest commits from GitHub when installing; the command to use is pip3 install git+https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit.git)

johannesburg commented 2 years ago

Brilliant! Thank you so much!