In most cases, the Hybrid performs well, yielding identical or similar results in (3) and (4).
In 10-20% of cases, terms pre-filtering is too stringent and returns no results.
We still have challenges with key terms, such as:
SV2AIR3 model formula
What is the SIDARTHE-V model?
Differences between the original SIDARTHE and SIDARTHE-V
Further work is needed on evaluation questions without key terms. Current performance is neither better nor worse than XDD V2. Improved metrics are needed to quantify results.
After issue #76 (elastic search integration), we should examine how it affects the evaluation.
Test set: ta1
Comparisons:
Takeaways:
Source: details