elastic / hey-apm

Basic load generation for apm-server built on hey
Apache License 2.0
16 stars 16 forks source link

Queue full and master branch failing #196

Open cachedout opened 3 years ago

cachedout commented 3 years ago

Hi @axw

We appear to be seeing a continual failure to complete the Hey APM Benchmark test suite within the allotted time period of 1hr.

The log containing the errors can be found here.

It appears as if the server is returning a queue full message.

Example: [2021-08-02T05:10:38.947Z] hey-apm_1 | 2021/08/02 05:10:38 logger.go:10: [debug] request failed: request failed with 503 Service Unavailable: {"accepted":0,"errors":[{"message":"queue is full"}]} (next request in ~0s)

axw commented 3 years ago

@cachedout do you know if we capture the apm-server log? That log is just for the hey-apm process, and doesn't provide any clues as to why the queue is full. Looks like something is permanently failing and preventing events from getting indexed.

cachedout commented 3 years ago

@axw Absolutely! The collective logs for all the dockerized services for that job can be found here. The APM Server logs are prefixed with apm-server_1.

axw commented 3 years ago

Thanks @cachedout. Looks like we're hitting https://github.com/elastic/apm-server/issues/5807

apm-server_1 | {"log.level":"error","@timestamp":"2021-08-03T05:03:13.732Z","log.logger":"pipelines","log.origin":{"file.name":"pipeline/register.go","file.line":50},"message":"Pipeline registration failed for apm_convert_destination_address.","service.name":"apm-server","event.dataset":"apm-server","ecs.version":"1.6.0"}

I was already planning to look at that soon, now I'll look at it a bit sooner ;)

simitt commented 2 years ago

It sounds like this should have been fixed with https://github.com/elastic/apm-server/issues/5807. Can this be closed?

cachedout commented 2 years ago

@simitt I think that the the specific issue that @axw is referring to has been resolved, but at the same time, we are still seeing tests occasionally fail and contain queue full messages during their run. I am not sure if the cause is the same as the on Andrew mentioned or if it is something different.

Here are three recent examples of failures which may need to be investigated:

https://apm-ci.elastic.co/job/apm-server/job/apm-hey-test-benchmark/830/ https://apm-ci.elastic.co/job/apm-server/job/apm-hey-test-benchmark/831/ https://apm-ci.elastic.co/job/apm-server/job/apm-hey-test-benchmark/832/