Open axw opened 2 years ago
Hmm, I just reconfigured the integration with expvar enabled, and now it's working. Maybe there's race condition?
Happened again after upgrading from 8.2.3 to 8.3.0-BC4. Initially the output was zero, after reconfiguring the integration (this time changing the event rate limit), the output went non-zero.
This is apparently still an issue, at least in system tests, as seen here:
I haven't been able to reproduce this exact error. However, due to the way our instrumentation works it is possible that after a reload event the old modelindexer is still receiving data while the instrumentation has moved to the new modelindexer. This is due to the fact that we wait for the old modelindexer to gracefully shutdown however, we switch the monitoring to new modelindexer before the old one exits.
The above will result in the instrumentation data to report 0
until the old indexer shuts down.
Moving this to backlog since we haven't spend more time recently to track this down.
It appears that this bug lead up to an incident (https://github.com/elastic/cloud/issues/110723) and should be prioritized
Moved it into the 8.7
milestone again to be picked up and verified if this is still a bug in current versions.
I don't recall if this has already been ruled out, but I realise now that I never wrote down on this issue a possible contributing factor: every time we reconfigure the server, we create a new libbeat monitoring registry: https://github.com/elastic/apm-server/blob/32a167b81356e19e9e173bb58a0503eea5e80e3d/internal/beater/beater.go#L628
Hmm, nice catch. I don't remember any conversation around this so I think this hasn't been ruled out.
I was looking at this today and I have 2 questions:
libbeatMonitoringRegistry
instead of creating it anew like it is done for the output
registry https://github.com/elastic/apm-server/blob/32a167b81356e19e9e173bb58a0503eea5e80e3d/internal/beater/beater.go#L634-L639 What do you think?how can I send some test data?
You could use https://github.com/elastic/apm-server/tree/main/systemtest/cmd/sendotlp to send test data to APM Server
my first hint at this would be to try reusing the libbeatMonitoringRegistry instead of creating it anew like it is done for the output registry
You could try, but I don't think that will work. There are assumptions about there being a 1:1 relationship between metrics and outputs, e.g. here: https://github.com/elastic/apm-server/blob/98806224092aa9646d2cf8466517b0955e8476b6/internal/beater/beater.go#L688-L696
APM Server version (
apm-server version
): 8.3.0-BC4Description of the problem including expected versus actual behavior:
"Output Events Rate" in stack monitoring is always zero.
Steps to reproduce: