For the user’s questions, retrieve a ranked list of named entities sorted according to relevance, analyze the question to determine which type of entity need to be retrieved.
QuestionVectorAnnotator
We need to use this annotator to extract tokens from query text, since we can not simply search query sentence in database.
This annotator processes query text in three steps:
remove all punctuation in text
convert words to avoid different forms
extract keywords from sentence
The original query sentence may contains punctuation like comma, period, column. And different forms of a same word, as "is" and "was". All of these will influence the accuracy of our returned result.
This time, we used:
Regular Expression
Stopwords Dictionary
Stanford Lemmatizer
Word Stemmer
For next step, we may seek for better solution.
Extract keywords from sentence is difficult. However, there are provided APIs which can help us.
MeSH service from API can return those most related keywords. For example, for question "Is Rheumatoid Arthritis more common in men or women?", after we call getKeywords() function, it will return "Is Rheumatoid Arthritis more common in men women". It also provides a list of related keywords, which may be useful in the improvement of accuracy in future.
Question
For the user’s questions, retrieve a ranked list of named entities sorted according to relevance, analyze the question to determine which type of entity need to be retrieved.
QuestionVectorAnnotator
We need to use this annotator to extract tokens from query text, since we can not simply search query sentence in database.
This annotator processes query text in three steps:
The original query sentence may contains punctuation like comma, period, column. And different forms of a same word, as "is" and "was". All of these will influence the accuracy of our returned result.
This time, we used:
For next step, we may seek for better solution.
Extract keywords from sentence is difficult. However, there are provided APIs which can help us.
MeSH service from API can return those most related keywords. For example, for question "Is Rheumatoid Arthritis more common in men or women?", after we call getKeywords() function, it will return "Is Rheumatoid Arthritis more common in men women". It also provides a list of related keywords, which may be useful in the improvement of accuracy in future.