Open carsonip opened 3 months ago
The first thing I looked at was what was getting reported by the benchmark failures. Here are 2 links to the benchmark run:
Both of these show 500 internal error
, however, the logs for 0
events/sec additionally show data validation errors
due to unexpected EOF. These errors seemed to be logged from here. This could be an issue with our sender, however, the most intriguing thing is why only a subset of delta metrics are reported as 0
. For example: in the above link, the txn/sec and metrics/sec are reported correctly whereas other delta metrics are reported as zero.
I have tried reproducing the errors locally but haven't succeeded (note that the expvar metrics collection is designed for benchtimes in minutes so if testing locally make sure that you have a good enough benchtime to give expvar metrics to work correctly). I did see some special handling in the expvar metric collection but nothing explains this bug.
I have also created a PR to log errors in expvar endpoint which was not done before. I am not sure how helpful it will be though.
Is this still happening?
@simitt I had this happen to me in a run on GH Actions last week, see Slack Thread: https://elastic.slack.com/archives/C95SB62AG/p1729263104854879
Nightly benchmarks occasionally report 0 events/s. Investigate the root cause of it.