Closed yhamoudi closed 10 years ago
Just a question about a strange thing (it's on branch non_question_word_questions
). For all questions starting by Is
or What
, the algorithm (in questionIdentify.py
) fails to identify the question word.
Question words are defined in closeQuestionWord
and openQuestionWord
maps, Is
and What
are the 2 first items of these maps. If you place them at another position in the map, it works! In fact, only the first word of each map is not recognized. How is it possible (the problem doesn't appear on branch triple_standardize
, that uses only one map for all question words)?
Because the first item of the list is:
"""
Taken from: http://www.interopia.com/education/all-question-words-in-english/
Rarely used: Wherefore, Whatever, Wherewith, Whither, Whence, However
What"""
(what you wanted to be a docstring was actually concatenated to the first item)
Docstrings are not denoted by the """
chars (it just says “that's a multiline string”) but are by their position in the code.
ok, so no docstring in a map i imagine?
For the non-case-sensitiveness: could we not simply write all the question words of the list in lowercase, and then use .lower()
method in the function identifyQuestionWord
?
Defining a new class just for this seems a bit "heavy"...
That's what the class does, actually.
The algorithm used to identify the question words (1 or 2 words) is in
questionIdentify.py
.The current algorithm is very sensitive. If you write
WHo are you
orwho are you
instead ofWho are you
, it fails to identify the question word.Improvements:
(and transform the
questionWord
in order it only contains "non-capital" words (ie replaceWho
bywho
for example) )Must be fixed for midterm (it's not difficult and it's very useful)