Open nightmarebug opened 1 year ago
I'm investigating a similar situation with my servers (PHP 7.4, CentOS 7).
I noticed that when elastic apm agent php is installed and enabled, several connections keep open on MySQL with the Sleep state. Sometimes, the connections ends with 0 (zero) time. But, sometimes, the connections keeps open until wait_timeout is reached.
If I uninstall the elastic apm agent, none Sleeping connections will appear, as usual.
Of course, we can tune "wait_timeout" to workaround this, but I'm curious if is expected to have such Sleeping connections on MySQL when apm agent is enabled, or not.
Also experiencing on php 8.1 and alpine base image.
CPU seems to be from an increase in spans from wordpress, but it might be something else. Spans have gone way up, but limiting spans and stack traces hasn't resolved the issue.
Adding an image from using phpspy and a minimalist wordpress setup, you can see under load testing APM is responsible for almost all CPU usage
When using a comparison between 1.9.1 and 1.8.4, Elastic\Apm\Impl\AutoInstrument\PhpPartFacade::callAndSwallowThrowable showed a massive increase in CPU usage. 1.9.1 used 92% /1.88 CPU hours in the same amount of time that 1.8.4 used 11.11% of the CPU/.22 CPU hours.
Means 1.8.4 have less cpu usage
My team has reported this issue as well when moving to 1.9.0 (php 8.1 and alpine base image). They have regressed to 1.8.4 until this is resolved.
@nightmarebug @6fears7 @bck01215
Could you please try the latest release v1.12.0 and if the agent still cause CPU increase please try setting ast_process_enabled
configuration option (ELASTIC_APM_AST_PROCESS_ENABLED
environment variable or elastic_apm.ast_process_enabled
php.ini option) to false
.
Setting the ELASTIC_APM_AST_PROCESS_ENABLED
to false
did reduce the CPU usage notably.
Before: Total of 61% usage in a CPU frame
After: Total of 34% usage in a CPU frame.
Hi @6fears7 Thank you testing the last suggestion. Regarding the remaining CPU overhead it seems to be related to spans created by MySQLi instrumentation. Could you please check the effect on CPU if:
span_stack_trace_min_duration
option is set to something like 100ms
disable_instrumentations
option is set to MySQLi
@SergeyKleyman
Roughly 20% CPU utilization in the frame.
@6fears7 Thank you for the feedback.
Could you please provide for comparison the flame graph without the agent (for example with elastic_apm
extension commented out in php.ini)?
Also is 20% CPU utilization you mentioned above inline with your expectations?
With agent disabled.
After seeing 20% utilization, I decided to regress back to 1.8.4 with all the same configs set that we've discussed above. 10% utilization.
Then I went back and did 1.12.0 again and extended my test duration by 5x the amount just to smooth the data out. The test is just curling the wordpress endpoint every two seconds until the duration is up.
There were frames that hit 30%, but overall the average remained the same as 1.8.4: 10%.
So in summary and to answer your question @SergeyKleyman , these options brought the CPU usage down to the 1.8.4 levels that I'd expected the agent to utilize:
elastic_apm.ast_process_enabled=false
elastic_apm.span_stack_trace_min_duration=100ms
elastic_apm.disable_instrumentations=MySQLi
References: span_stack_trace_min_duration option disable_instrumentations option
I don't have an upper bound in mind of acceptable thresholds, but if the average is 10% and the upper is 30%, that works for us.
I installed version 1.9.0 of the elastic APM agent into production server which runs php-7.4 on ubuntu 20.04 . The application itself is a large mysql backed Zend(1.1) application.
After installing APM withing few hours CPU is usage over 95% and application got crashed
After disabling the APM it returned to very low CPU usage and everything works fine
Congifuration:-
elastic_apm.enabled = true elastic_apm.environment = "production" elastic_apm.server_timeout = "30s" elastic_apm.async_backend_command = true elastic_apm.transaction_max_spans = 500 elastic_apm.transaction_sample_rate = 1.0