Open cachedout opened 3 years ago
@cachedout do you know if we capture the apm-server log? That log is just for the hey-apm process, and doesn't provide any clues as to why the queue is full. Looks like something is permanently failing and preventing events from getting indexed.
@axw Absolutely! The collective logs for all the dockerized services for that job can be found here. The APM Server logs are prefixed with apm-server_1
.
Thanks @cachedout. Looks like we're hitting https://github.com/elastic/apm-server/issues/5807
[36mapm-server_1 |[0m {"log.level":"error","@timestamp":"2021-08-03T05:03:13.732Z","log.logger":"pipelines","log.origin":{"file.name":"pipeline/register.go","file.line":50},"message":"Pipeline registration failed for apm_convert_destination_address.","service.name":"apm-server","event.dataset":"apm-server","ecs.version":"1.6.0"}
I was already planning to look at that soon, now I'll look at it a bit sooner ;)
It sounds like this should have been fixed with https://github.com/elastic/apm-server/issues/5807. Can this be closed?
@simitt I think that the the specific issue that @axw is referring to has been resolved, but at the same time, we are still seeing tests occasionally fail and contain queue full
messages during their run. I am not sure if the cause is the same as the on Andrew mentioned or if it is something different.
Here are three recent examples of failures which may need to be investigated:
https://apm-ci.elastic.co/job/apm-server/job/apm-hey-test-benchmark/830/ https://apm-ci.elastic.co/job/apm-server/job/apm-hey-test-benchmark/831/ https://apm-ci.elastic.co/job/apm-server/job/apm-hey-test-benchmark/832/
Hi @axw
We appear to be seeing a continual failure to complete the Hey APM Benchmark test suite within the allotted time period of 1hr.
The log containing the errors can be found here.
It appears as if the server is returning a
queue full
message.Example:
[2021-08-02T05:10:38.947Z] hey-apm_1 | 2021/08/02 05:10:38 logger.go:10: [debug] request failed: request failed with 503 Service Unavailable: {"accepted":0,"errors":[{"message":"queue is full"}]} (next request in ~0s)