Closed jtibshirani closed 4 years ago
A few times I accidentally had an error in the search operation definition, for example I misspelled the name of the field or structured the query incorrectly. Both the contender and baseline were returning no results, but I simply thought my optimization wasn't working.
By default Rally will just continue on errors and report an error rate at the end of the benchmark but I think you can already achieve what you're after by specifying --on-error=abort
? It would abort the benchmark as soon as Elasticsearch returns an error and provides more context (i.e. the response body from Elasticsearch). Would that help you?
I wonder if it'd be useful for rally to return the number of hits from a search operation, or even the whole search response
Rally already returns the number of hits in the metrics metadata but you need to configure a (dedicated) Elasticsearch metrics store for that. I think it would cover most of your needs. Here is an example from our nightly benchmarks:
{
"environment": "nightly",
"race-id": "6cac8c34-93b4-416e-a1ea-6c4e1b87f9fd",
"race-timestamp": "20191005T200042Z",
"@timestamp": 1570306375355,
"relative-time": 41959,
"track": "geonames",
"challenge": "append-no-conflicts",
"car": "defaults",
"operation": "phrase",
"task": "phrase",
"name": "service_time",
"operation-type": "Search",
"sample-type": "warmup",
"value": 3.7241727113723755,
"unit": "ms",
"meta": {
"success": true,
"distribution_flavor": "oss",
"source_revision": "d598083051d46e7d44f577473dde52c4a32f6776",
"distribution_version": "8.0.0-SNAPSHOT",
"tag_license": "oss",
"timed_out": false,
"took": 1,
"hits": 58,
"hits_relation": "eq"
}
}
In the meta.hits
property we already track the number of hits and the meta
property probably contains a few more things you're interested in.
By default Rally will just continue on errors and report an error rate at the end of the benchmark but I think you can already achieve what you're after by specifying --on-error=abort?
In my use case (benchmarking different search strategies) this wouldn't help, since we usually don't error on unmapped fields. For example, if you perform a term
query on a non-existent field, then we return a successful response with no hits. This is also very common behavior for aggregations. It also doesn't cover the case where I've structured the query slightly incorrectly and am not performing the operation I intended. This can happen surprisingly often given the complexity of the search DSL.
Rally already returns the number of hits in the metrics metadata but you need to configure a (dedicated) Elasticsearch metrics store for that
Thanks, I wasn't aware we already tracked that information in the metadata. The next time I'm benchmarking, I will try out my workflow using a metrics store and loop back if I have further suggestions.
I, too, would like to have addition information about the response. When benchmarking new aggregations, I've sometimes accidentally benchmarked buggy implementations that returned incorrect results, sometimes no results, and showed really fast performance. Since these runs are black-boxes, it is tough to know that I am benchmarking what I intended. Maybe due to bugs in the implementation, or bugs in the rally track.
What if this search-response information came in the form of simply outputting a sample response from a single iteration. maybe with a flag like --print-sample-search-responses
. This way it would be straightforward to inspect that the benchmark results are actual benchmarks of what you intended.
p.s. I had no idea about --on-error=abort
! should this be the default?
Thanks for the feedback both. You can log full requests and responses with Rally already today. Set the log level to "DEBUG"
for the logger "elasticsearch"
in ~/.rally/logging.json
(but beware that this will very likely cause a client-side bottleneck so double-check that you reset this again when you're interested in the performance metrics). If you then run your benchmark you'll see output like the following:
2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG > {"sort":[{"geonameid":"asc"}],"query":{"match_all":{}}}
2019-12-16 14:56:08,389 -not-actor-/PID:9790 elasticsearch DEBUG < {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1000,"relation":"eq"},"max_score":null,"hits":[{"_index":"geonames","_type":"_doc","_id":"Lb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2986043, "name": "Pic de Font Blanca", "asciiname": "Pic de Font Blanca", "alternatenames": "Pic de Font Blanca,Pic du Port", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2860", "timezone": "Europe/Andorra", "location": [1.53335, 42.64991]},"sort":[2986043]},{"_index":"geonames","_type":"_doc","_id":"Lr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2993838, "name": "Pic de Mil-Menut", "asciiname": "Pic de Mil-Menut", "alternatenames": "Pic de Mil-Menut", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "B3", "admin2_code": "09", "admin3_code": "091", "admin4_code": "09024", "population": 0, "dem": "2138", "timezone": "Europe/Andorra", "location": [1.65, 42.63333]},"sort":[2993838]},{"_index":"geonames","_type":"_doc","_id":"L781D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 2994701, "name": "Roc Mélé", "asciiname": "Roc Mele", "alternatenames": "Roc Mele,Roc Meler,Roc Mélé", "feature_class": "T", "feature_code": "MT", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "00", "population": 0, "dem": "2803", "timezone": "Europe/Andorra", "location": [1.74028, 42.58765]},"sort":[2994701]},{"_index":"geonames","_type":"_doc","_id":"ML81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3007683, "name": "Pic des Langounelles", "asciiname": "Pic des Langounelles", "alternatenames": "Pic des Langounelles", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "00", "population": 0, "dem": "2685", "timezone": "Europe/Andorra", "location": [1.47364, 42.61203]},"sort":[3007683]},{"_index":"geonames","_type":"_doc","_id":"Mb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3017832, "name": "Pic de les Abelletes", "asciiname": "Pic de les Abelletes", "alternatenames": "Pic de la Font-Negre,Pic de la Font-Nègre,Pic de les Abelletes", "feature_class": "T", "feature_code": "PK", "country_code": "AD", "cc2": "FR", "admin1_code": "A9", "admin2_code": "66", "admin3_code": "663", "admin4_code": "66146", "population": 0, "dem": "2411", "timezone": "Europe/Andorra", "location": [1.73343, 42.52535]},"sort":[3017832]},{"_index":"geonames","_type":"_doc","_id":"Mr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3017833, "name": "Estany de les Abelletes", "asciiname": "Estany de les Abelletes", "alternatenames": "Estany de les Abelletes,Etang de Font-Negre,Étang de Font-Nègre", "feature_class": "H", "feature_code": "LK", "country_code": "AD", "cc2": "FR", "admin1_code": "A9", "population": 0, "dem": "2260", "timezone": "Europe/Andorra", "location": [1.73362, 42.52915]},"sort":[3017833]},{"_index":"geonames","_type":"_doc","_id":"M781D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3023203, "name": "Port Vieux de la Coume d’Ose", "asciiname": "Port Vieux de la Coume d'Ose", "alternatenames": "Port Vieux de Coume d'Ose,Port Vieux de Coume d’Ose,Port Vieux de la Coume d'Ose,Port Vieux de la Coume d’Ose", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2687", "timezone": "Europe/Andorra", "location": [1.61823, 42.62568]},"sort":[3023203]},{"_index":"geonames","_type":"_doc","_id":"NL81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3029315, "name": "Port de la Cabanette", "asciiname": "Port de la Cabanette", "alternatenames": "Port de la Cabanette,Porteille de la Cabanette", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "cc2": "AD,FR", "admin1_code": "B3", "admin2_code": "09", "admin3_code": "091", "admin4_code": "09139", "population": 0, "dem": "2379", "timezone": "Europe/Andorra", "location": [1.73333, 42.6]},"sort":[3029315]},{"_index":"geonames","_type":"_doc","_id":"Nb81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3034945, "name": "Port Dret", "asciiname": "Port Dret", "alternatenames": "Port Dret,Port de Bareites,Port de las Bareytes,Port des Bareytes", "feature_class": "T", "feature_code": "PASS", "country_code": "AD", "admin1_code": "00", "population": 0, "dem": "2660", "timezone": "Europe/Andorra", "location": [1.45562, 42.60172]},"sort":[3034945]},{"_index":"geonames","_type":"_doc","_id":"Nr81D28Bu7VEEZ3mXFGw","_score":null,"_source":{"geonameid": 3038814, "name": "Costa de Xurius", "asciiname": "Costa de Xurius", "feature_class": "T", "feature_code": "SLP", "country_code": "AD", "admin1_code": "07", "population": 0, "dem": "1839", "timezone": "Europe/Andorra", "location": [1.47569, 42.50692]},"sort":[3038814]}]}}
p.s. I had no idea about --on-error=abort! should this be the default?
It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections and we just want to record that we went too far but not necessarily abort benchmarking.
In any case I sense we should improve the docs to include examples for such scenarios?
In any case I sense we should improve the docs to include examples for such scenarios?
Knowledge of this via docs sounds like it would have solved my mistakes in usage. +1 to examples in the docs. thanks for the info!
It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections
is this the norm rather than the exception?
It can happen that we intentionally push Elasticsearch to provoke e.g. bulk index rejections is this the norm rather than the exception?
Honestly, I am hesitant to change the default behavior for this flag. It is common among load testing tools to continue on errors (e.g. JMeter does this as well by default) and in our experiments we usually want to continue a benchmark even when an error is indicated (by an error rate > 0 in the report), e.g. if one out many benchmarked queries fails e.g. due to changed syntax or tighter restrictions all of the other benchmark results are usually still fine and we would not want to lose them (btw, this is the reason we run Rally's integration tests in CI with the stricter --on-error=abort
to spot such issues before the problem occurs in the nightly benchmarks). However, we then also investigate and try to eliminate / understand the cause.
"Implementation" notes: We should add a new page to our documentation that describes tips & tricks when benchmarking with specific examples (in a "recipe" style). The purpose of this example is to explain how to get request and response when running a benchmark with Rally (see my comment above how it's done). We should also mention the caveats (debug logging should not be enabled in a "real" benchmark).
Propose adding the following to recipes:
As part of developing a new search feature or optimization, I often run benchmarks by adding a new search operation to an existing rally track. I've found that it can be easy to make mistakes in setting up the benchmark:
For this use case, I wonder if it'd be useful for rally to return the number of hits from a search operation, or even the whole search response. Seeing that the baseline and contender return the same number of results (and return more than 0 results) would give confidence that I've set up the benchmark correctly and help avoid some manual testing.