We want to be able to benchmark our search performance - basically to make sure are actually finding what they are searching for in the highest positions in the search results.
By having a fixed and stable measure we can test the performance of the search engine and monitor changes when data changes or our code changes (sometimes both at the same time). Also we can test modifications to the search engine configuration itself and see which configuration is best.
What needs to be done?
This issue is for building a Python library, which at its core:
Receives as input list of 'queries' and 'expected results'
Performs searches according to the queries and tests to see if the expected results appear and in which positions
Gives a score for each query and a total score for the entire configuration
On top of that we would need supporting mechanisms (which are not part of this issue, but keep them in mind):
Run periodically
Save results to DB
Have web-ui to view the results (in a graph and detailed results)
Details
Queries
Anything that can be fed to the search API (i.e. query, filters, sorting etc.)
We can separate the configuration and the 'query builder' mechanism, so that it's more modular and can use any 'query builder' (e.g. our apies but potentially others) to create the ES query.
Expected results
Expected results can be specific document ids that need to return
They can also be a predicate function returning True/False (e.g. for the query 'J. K. Rowling' on a library DB we expect to see books that she is their author, without specific preference for order)
Format
Input configuration can be provided in code, or in other form of configuration file
Scoring
Score function need to be also configurable - some would give higher score to the first search page over the second, others would prefer a more smooth scoring mechanism.
Motivation:
We want to be able to benchmark our search performance - basically to make sure are actually finding what they are searching for in the highest positions in the search results.
By having a fixed and stable measure we can test the performance of the search engine and monitor changes when data changes or our code changes (sometimes both at the same time). Also we can test modifications to the search engine configuration itself and see which configuration is best.
What needs to be done?
This issue is for building a Python library, which at its core:
On top of that we would need supporting mechanisms (which are not part of this issue, but keep them in mind):
Details
Queries
Anything that can be fed to the search API (i.e. query, filters, sorting etc.) We can separate the configuration and the 'query builder' mechanism, so that it's more modular and can use any 'query builder' (e.g. our
apies
but potentially others) to create the ES query.Expected results
Expected results can be specific document ids that need to return They can also be a predicate function returning True/False (e.g. for the query 'J. K. Rowling' on a library DB we expect to see books that she is their author, without specific preference for order)
Format
Input configuration can be provided in code, or in other form of configuration file
Scoring
Score function need to be also configurable - some would give higher score to the first search page over the second, others would prefer a more smooth scoring mechanism.