fhamborg / Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Apache License 2.0
505 stars 87 forks source link

Example code does not work return self.get_answers(question=question)[0] #36

Closed iitrsamrat closed 5 years ago

iitrsamrat commented 5 years ago

Describe the bug The below example crashes. https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/examples/extracting/parse_single_from_code.py

To Reproduce run https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/examples/extracting/parse_single_from_code.py

Expected behavior File "/Users/samrat.saha/PycharmProjects/EventExtraction/event_extractor.py", line 84, in top_when_answer = doc.get_top_answer('when').get_parts_as_text() File "/Users/samrat.saha/miniconda3/envs/py36/lib/python3.6/site-packages/Giveme5W1H/extractor/document.py", line 151, in get_top_answer return self.get_answers(question=question)[0] IndexError: list index out of range

Process finished with exit code 1

Screenshots

Versions (please complete the following information):

fhamborg commented 5 years ago

Please post the full log output

iitrsamrat commented 5 years ago

This is the full console output. You expecting something else can you run the example code i think its reproducible.

File "/Users/samrat.saha/miniconda3/envs/py36/lib/python3.6/site-packages/Giveme5W1H/extractor/document.py", line 151, in get_top_answer return self.get_answers(question=question)[0] IndexError: list index out of range

I changed the following function to make it work.

**def get_top_answer(self, question):
    return self.get_answers(question=question)**

Following code i am using to extract the ner.

There is no point extracting a "who" whose ner is 'O' There is no point extracting a "where" whose ner is 'O' There is no point extracting a "whom" whose ner is 'O'

W5 = ['who', 'what', 'when', 'where', 'why', 'how'] for w in W5: top_answer = doc.get_top_answer(w)

            for t in top_answer:
                score = t.get_score()
                response = t.get_json()
                #print(response)
                for k, v in response.items():
                    if k is 'parts':
                        for vl in v:
                            d, pos = vl
                            ner = d['nlpToken']['ner']
                            #print(d['nlpToken']['ner'])
                            #print(d['nlpToken']['pos'])

I will share the result of extraction on news data. Its not encourging as of now..

fhamborg commented 5 years ago

So this issue can be closed, since it's working?