facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.48k stars 898 forks source link

RuntimeError: No valid word in... #92

Closed my-master closed 6 years ago

my-master commented 6 years ago

Hi guys!

Excellent implementation and the article, thank you.

I'm trying to run scripts/pipeline/interactive.py from Quick Start: Demo and it looks a little bit strange that when I process something like this:...

>>> process("Who am i?")

... then I am getting this:

02/09/2018 07:12:33 PM: [ Processing 1 queries... ]
02/09/2018 07:12:33 PM: [ Retrieving top 5 docs... ]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "scripts/pipeline/interactive.py", line 81, in process
    question, candidates, top_n, n_docs, return_context=True
  File "/media/olga/Data/projects/DrQA/drqa/pipeline/drqa.py", line 184, in process
    top_n, n_docs, return_context
  File "/media/olga/Data/projects/DrQA/drqa/pipeline/drqa.py", line 197, in process_batch
    ranked = [self.ranker.closest_docs(queries[0], k=n_docs)]
  File "/media/olga/Data/projects/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 59, in closest_docs
    spvec = self.text2spvec(query)
  File "/media/olga/Data/projects/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 98, in text2spvec
    raise RuntimeError('No valid word in: %s' % query)
RuntimeError: No valid word in: Who am i?

And when I am running something like this:...

>>> process("skhsfgh dseijfekrhfe")

...then I am getting a valid result:

02/14/2018 04:40:01 PM: [ Processing 1 queries... ]
02/14/2018 04:40:01 PM: [ Retrieving top 5 docs... ]
02/14/2018 04:40:06 PM: [ Reading 177 paragraphs... ]
02/14/2018 04:40:10 PM: [ Processed 1 queries in 8.4385 (s) ]
Top Predictions:
+------+--------+-------------+--------------+-----------+
| Rank | Answer |     Doc     | Answer Score | Doc Score |
+------+--------+-------------+--------------+-----------+
|  1   |  2011  | Brazil Ride |    371.72    |   295.06  |
+------+--------+-------------+--------------+-----------+

Contexts:
[ Doc = Brazil Ride ]
Stages 2011

Is it an expected behavior?

Thanks!

ajfisch commented 6 years ago

For the retrieval we filter out stopwords. So “who am I” contains no non-stopword tokens.

The second is mapped to some random buckets for retrieval (recall we use feature hashing) and then treated as “UNK UNK” and gives some garbage answer.

my-master commented 6 years ago

Ok, thank you for the answer.