OpenBudget / BudgetKey

Opening the Israeli Budget!
https://next.obudget.org
49 stars 15 forks source link

Search Benchmarking Tool #410

Open akariv opened 5 years ago

akariv commented 5 years ago

Motivation:

We want to be able to benchmark our search performance - basically to make sure are actually finding what they are searching for in the highest positions in the search results.

By having a fixed and stable measure we can test the performance of the search engine and monitor changes when data changes or our code changes (sometimes both at the same time). Also we can test modifications to the search engine configuration itself and see which configuration is best.


What needs to be done?

This issue is for building a Python library, which at its core:

On top of that we would need supporting mechanisms (which are not part of this issue, but keep them in mind):

Details

Queries

Anything that can be fed to the search API (i.e. query, filters, sorting etc.) We can separate the configuration and the 'query builder' mechanism, so that it's more modular and can use any 'query builder' (e.g. our apies but potentially others) to create the ES query.

Expected results

Expected results can be specific document ids that need to return They can also be a predicate function returning True/False (e.g. for the query 'J. K. Rowling' on a library DB we expect to see books that she is their author, without specific preference for order)

Format

Input configuration can be provided in code, or in other form of configuration file

Scoring

Score function need to be also configurable - some would give higher score to the first search page over the second, others would prefer a more smooth scoring mechanism.

MaaikeB commented 5 years ago

:+1: I like the issue, thanks