fhamborg / Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Apache License 2.0
512 stars 89 forks source link

List index out of range #58

Open MaxKe99 opened 3 years ago

MaxKe99 commented 3 years ago

Describe the bug I tried to run the example parse_from_newsplease.py. When attempting to extract the top answer for all 6 questions, I receive a list index out of range error, similar to #36. Sadly, his proposed fix does not work in my case.

grafik

The error doesn't occur when trying to only extract Who, What and When.

To Reproduce I used the code from parse_from_newsplease.py and added a few lines to extract and print answers for all 6 questions. I installed Giveme5W1H through pip.

questions = ['who', 'what', 'when', 'where', 'why', 'how'] for q in questions: answers.append(doc.get_top_answer(q).get_parts_as_text()) for i in range(len(answers)): print(answers[i])

Expected behavior

I expected to receive all six answers.

Versions (please complete the following information):

Lolologist commented 3 years ago

I am having the same problem, all the same versions with exception being I'm on a Mac.

TitasDas commented 3 years ago

The error actually has nothing to do with lines 150-151 of document.py as suggested in #36

def get_top_answer(self, question):
        return self.get_answers(question=question)[0]

Please leave those lines unchanged. It basically means that there isn't an answer for that question for the text that is being given to the extractor.

I would suggest using try, except, else blocks for each of the questions as shown below to see which question is not being answered.

    try:
        who_answer = doc.get_top_answer('who').get_parts_as_text()
    except IndexError:
        print("An answer for 'who' doesn't exist for this piece of text")
    else:
        print("Who :", who_answer)

Similarly in the example given in parse_single_from_code.py , when you try using the lead or title short which have very little text content you may get the same error. But for text , you will see that all the questions are answered and you don't encounter this error.