CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
124 stars 29 forks source link

IndexError:pop index out of range #10

Open HaoXie2022 opened 2 years ago

HaoXie2022 commented 2 years ago

I opened examples\using_snowball.ipynb in vscode and encountered an IndexError:pop index out of range when I run to the following code.I was wondering if you could offer some advice.

snowball.train(corpus='../tests/data/relex/curie_training_set/').

IndexError Traceback (most recent call last) in ----> 1 snowball.train(corpus='../tests/data/relex/curie_training_set/') ~\Anaconda3\envs\chem2\lib\site-packages\chemdataextractor\relex\snowball.py in train(self, corpus, skip) 182 print('{}/{}:'.format(i + skip + 1, len(corpus_list)), ' ', file_name) 183 f = os.path.join(corpus, file_name) --> 184 self.train_from_file(f) 185 else: 186 assert(isinstance(corpus, list)) ~\Anaconda3\envs\chem2\lib\site-packages\chemdataextractor\relex\snowball.py in train_from_file(self, filename) 204 # files used for training are saved automatically 205 f_log = open('snowball_training_set.txt', 'a') --> 206 if self.train_from_document(d): 207 print(basename(filename), file=f_log) 208 f_log.close() ~\Anaconda3\envs\chem2\lib\site-packages\chemdataextractor\relex\snowball.py in train_from_document(self, d) 225 if sent_definitions: 226 self.model.update(sent_definitions) --> 227 if self.train_from_sentence(s): 228 # return 'True if there was a Snowball candidate in the document 229 candidate_found = True ~\Anaconda3\envs\chem2\lib\site-packages\chemdataextractor\relex\snowball.py in train_from_sentence(self, s) 238 """ 239 candidate_dict = {} --> 240 candidate_relationships = self.candidates(s.tokens) 241 candidate_found = False 242 if len(candidate_relationships) ~\Anaconda3\envs\chem2\lib\site-packages\chemdataextractor\relex\snowball.py in candidates(self, tokens) 370 to_pop.append([i, j][np.argmin([entities[i].end, entities[j].end])]) 371 for p in to_pop: --> 372 entities_dict[k].pop(p) 373 374 # Filter out incomplete models IndexError: pop index out of range

The following has been output on the screen.

5/6: c3nr33950e.html 6/6: c6cp00375c.html

Spadet commented 2 years ago

Hi, digging up this topic since I encountered the same error message when trying to train a Snowball parser. Candidates are proposed but when typing an answer, no matter what I write I get this :

Traceback (most recent call last):
  File "~/New_parser.py", line 64, in <module>
    snowball.train(corpus='~/Documents/SB_Training/')
  File "~/chemdataextractor/relex/snowball.py", line 184, in train
    self.train_from_file(f)
  File "~/chemdataextractor/relex/snowball.py", line 206, in train_from_file
    if self.train_from_document(d):
  File "~/chemdataextractor/relex/snowball.py", line 227, in train_from_document
    if self.train_from_sentence(s):
  File "~/chemdataextractor/relex/snowball.py", line 240, in train_from_sentence
    candidate_relationships = self.candidates(s.tokens)
  File "~/chemdataextractor/relex/snowball.py", line 372, in candidates
    entities_dict[k].pop(p)
IndexError: pop index out of range

If you have any solution, I would be glad ! Thanks :)

ti250 commented 2 years ago

Hmmm I don't have much experience with Snowball myself, I've sent a message to someone in the group who does; I'll let you know once I get more information!

rayanc72 commented 2 years ago

Hi, digging up this topic since I encountered the same error message when trying to train a Snowball parser. Candidates are proposed but when typing an answer, no matter what I write I get this :

Traceback (most recent call last):
  File "~/New_parser.py", line 64, in <module>
    snowball.train(corpus='~/Documents/SB_Training/')
  File "~/chemdataextractor/relex/snowball.py", line 184, in train
    self.train_from_file(f)
  File "~/chemdataextractor/relex/snowball.py", line 206, in train_from_file
    if self.train_from_document(d):
  File "~/chemdataextractor/relex/snowball.py", line 227, in train_from_document
    if self.train_from_sentence(s):
  File "~/chemdataextractor/relex/snowball.py", line 240, in train_from_sentence
    candidate_relationships = self.candidates(s.tokens)
  File "~/chemdataextractor/relex/snowball.py", line 372, in candidates
    entities_dict[k].pop(p)
IndexError: pop index out of range

If you have any solution, I would be glad ! Thanks :)

I found that adding a print statement for the list in snowball.py solves this problem (not sure why though!). Here's the updated section:

if len(entities_dict[k]) > 1:
                to_pop = []
                for i in range(len(entities)):
                    for j in range(i+1, len(entities)):
                        if entities[i].start == entities[j].start:
                            to_pop.append([i, j][np.argmin([entities[i].end, entities[j].end])])
                            print(to_pop)

Edit: This is not a permanent solution. It works with the example set of papers but fails on other papers. I managed to get past it by using a try-except loop.