Yes/No Questions: Are there any DNMT3 proteins present in plants?
Factoid questions: Is Rheumatoid Arthritis more common in men or women?
List questions: What is the most prominent sequence consensus for the polyadenylation site?
Summary questions: What is the mechanism of action of anticoagulant medication Dabigatran?
The difficulty of these questions are increasing, since first two types questions can be solved by searching keywords or compare sentence similarity, but last two types may need some "intelligent thinking" in our system. I think we may going to solve first two types in the first stages, when the system's performance is good enough, we will try our best to solve last two questions.
Technologies and External Tools
Data Resource: including corpora, training sample, testing sample
Some Text Analyse Tools: Lingpipe, ABNER, ...
Main Components
One collection reader, multiple annotators, and one CAS consumer.
Aspects
From past homework, maybe we can use Lingpipe to extract gene information, and using some sentence similarity algorithm to find answers.
Evaluation
Since the golden standard answer is not provided, we may mark the golden answer by ourselves manually. As for evaluation, we may use some conceptions like MRR, p-value, confidence interval
Nice job!
I think we may need to research on the intelligent component.
After implementing the solutions for the first and second question, we may have a clearer idea about the error analysis part.
Proposal Outline
Questions Focused on Final System
The difficulty of these questions are increasing, since first two types questions can be solved by searching keywords or compare sentence similarity, but last two types may need some "intelligent thinking" in our system. I think we may going to solve first two types in the first stages, when the system's performance is good enough, we will try our best to solve last two questions.
Technologies and External Tools
Main Components
One collection reader, multiple annotators, and one CAS consumer.
Aspects
From past homework, maybe we can use Lingpipe to extract gene information, and using some sentence similarity algorithm to find answers.
Evaluation
Since the golden standard answer is not provided, we may mark the golden answer by ourselves manually. As for evaluation, we may use some conceptions like MRR, p-value, confidence interval