apache / jmeter

Apache JMeter open-source load testing tool for analyzing and measuring the performance of a variety of services
https://jmeter.apache.org/
Apache License 2.0
8.11k stars 2.06k forks source link

JMeter test not stopping after duration ends in distribution mode #5455

Open asfimport opened 3 years ago

asfimport commented 3 years ago

Madhuri Jain (Bug 65000): I'm running a very simple JMeter test with Basic Thread Group and HTTP Sampler. The duration set for execution is 10 min (600 sec).

The test ran and stopped itself (after 10 min) successfully on my local both in JMeter GUI as well as CLI mode.

However, when I run the same test in distributed mode, the test does not stop itself and gets hanged. I've been observing this issue mostly with more thread count 200 or more worker count (~10).

Some of the JMeter properties I have overridden:

"server.rmi.ssl.disable": "true", "jmeterengine.nongui.maxport": JMETER_EXEC_PORT, "jmeterengine.nongui.port": JMETER_EXEC_PORT, "client.tries": str(3), "client.retries_delay": str(5000), "client.rmi.localport": CLIENT_RMI_LOCALPORT, "server.rmi.localport": SERVER_RMI_LOCALPORT, "server_port": SERVER_PORT, "server.exitaftertest": "true", "jmeterengine.stopfail.system.exit": "true", "jmeterengine.remote.system.exit": "true", "jmeterengine.force.system.exit": "true", "jmeter.save.saveservice.output_format": "csv", "jmeter.save.saveservice.autoflush": "true", "beanshell.server.file": "./extras/startup.bsh", "jmeter.save.saveservice.connect_time": "true", "jpgc.repo.sendstats": "false", Here're the JMeter CLI commands I'm using for JMeter client and server respectively:

// JMeter Client jmeter.sh -n -f -t {testPlan} -j jmeter.log -l report.csv -LINFO -Lorg.apache.http=DEBUG -Lorg.apache.http.wire=ERROR -Ljmeter.engine=DEBUG -X -R {serverIPs}

// JMeter Server jmeter.sh -s -Jbeanshell.server.port={beanshellServerPort}

  1. Any help pointers to make sure the test ends after the specified duration?
  2. Can this be controlled/enforced by any JMeter setting/property?
  3. Is it something related to Basic thread group v/s thread group plugins like Concurrency/Ultimate thread group?

Example test run:

I tried to run this test plan with 10 workers out of which 6 were successfully finished while 4 of them hanged. Please find attached the logs and test plan.

Also, why does the Summariser show Active + Finished threads more than Started?

Trail of Worker pod logs that is stuck:

2020-12-16 08:19:48,933 INFO o.a.j.t.JMeterThread: Stopping because end time detected by thread: 10.244.11.4-Thread Group 1-86 2020-12-16 08:19:48,933 INFO o.a.j.t.JMeterThread: Thread finished: 10.244.11.4-Thread Group 1-86 2020-12-16 08:25:22,926 INFO o.a.j.e.RemoteJMeterEngineImpl: Shutting test ... 2020-12-16 08:25:22,927 INFO o.a.j.e.RemoteJMeterEngineImpl: ... stopped 2020-12-16 08:25:22,928 INFO o.a.j.t.JMeterThread: Stopping: 10.244.11.4-Thread Group 1-78 2020-12-16 08:25:22,928 INFO o.a.j.t.JMeterThread: Stopping: 10.244.11.4-Thread Group 1-190

Created attachment jmeter_stuck_execution.zip: test plan

Severity: normal OS: Linux

asfimport commented 3 years ago

@pmouawad (migrated from Bugzilla): Hello, If it's an HTTP based load test, have you set connect and read timeout on HTTP Requests ?

If it's another type of test, check that you don't have hanging samplers.

If you identify the hanging node, then run a thread dump using jstack or jmeter threaddump.sh and attach output here.

Thanks

asfimport commented 3 years ago

Madhuri Jain (migrated from Bugzilla): Created attachment thread_dump_20201219_081100_174.log: Thread Dump on JMeter client

asfimport commented 3 years ago

@pmouawad (migrated from Bugzilla): Hello, Please also provide thread dumps of server nodes also, indicate which one corresponds to the hanging one.

Thanks

asfimport commented 3 years ago

Madhuri Jain (migrated from Bugzilla): Hi, Thank you very much for your response.

It's an HTTP based web test plan but we have not set any timeout explicitly (must be defaults if any). For more details, the test plan is attached in the bug.

The JMeter distributed setup is running on jre-headless docker image so it lacks jdk utilities. I'm facing two issues while getting thread dump:

  1. The issue is occurring inconsistently and randomly.

  2. While trying to run jmeter threaddump.sh, this only works on JMeter client (attached thread dump) but not on JMeter servers. Is there a way to run it and get thread dump on JMeter server?

Thanks!

(In reply to Philippe Mouawad from comment 1)

Hello, If it's an HTTP based load test, have you set connect and read timeout on HTTP Requests ?

If it's another type of test, check that you don't have hanging samplers.

If you identify the hanging node, then run a thread dump using jstack or jmeter threaddump.sh and attach output here.

Thanks

asfimport commented 3 years ago

@pmouawad (migrated from Bugzilla): (In reply to Madhuri Jain from comment 4)

Hi, Thank you very much for your response.

It's an HTTP based web test plan but we have not set any timeout explicitly (must be defaults if any).

By default, we wait infinitely which can be a cause for hanging. So please try setting in Advanced tab connect (500 is an acceptable value) and read (30000) timeouts and see if it hangs.

For more details, the test plan is attached in the bug.

The JMeter distributed setup is running on jre-headless docker image so it lacks jdk utilities. I'm facing two issues while getting thread dump:

  1. The issue is occurring inconsistently and randomly.

You can probably install a jdk instead and you'll have it;

  1. While trying to run jmeter threaddump.sh, this only works on JMeter client (attached thread dump) but not on JMeter servers. Is there a way to run it and get thread dump on JMeter server?

Install a JDK on your docker image Connect to it using bash and run jstack from inside the image

Thanks!

(In reply to Philippe Mouawad from comment 1) > Hello, > If it's an HTTP based load test, have you set connect and read timeout on > HTTP Requests ? > > If it's another type of test, check that you don't have hanging samplers. > > If you identify the hanging node, then run a thread dump using jstack or > jmeter threaddump.sh and attach output here. > > Thanks

asfimport commented 3 years ago

Madhuri Jain (migrated from Bugzilla): (In reply to Philippe Mouawad from comment 5) Hello, Thank you for the useful insight and apologies for delay.

I had been trying to reproduce the issue, but seems like it's not getting reproduced again. I'd get back with the details as soon as this gets reproduced on the setup.

To dig further, can I please get some more details on the following:

  1. Timeouts - Is there a recommended way to set default timeouts and control this behavior for all test plans run on a JMeter setup (distributed), through properties like httpclient.timeout and would it suffice?

  2. Thread dump - I've changed the image to mcr.microsoft.com/java/jdk:15-zulu-alpine (Link: https://hub.docker.com/_/microsoft-java-jdk)

Here're the errors I'm getting with different commands:

jstack -l -e 71 71: Unable to open socket file /proc/71/root/tmp/.java_pid71: target process 71 doesn't respond within Bug 10500ms or HotSpot VM not loaded

jstack -l -e -F 71 Error: -F option used Cannot connect to core dump or remote debug server. Use jhsdb jstack instead

jcmd 69 Thread.print 69: com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /proc/69/root/tmp/.java_pid69: target process 69 doesn't respond within Bug 10500ms or HotSpot VM not loaded at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:103) at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)

Even kill -3 <pid> is not working.

Thanks again for all the help!

(In reply to Madhuri Jain from comment 4) > Hi, > Thank you very much for your response. > > It's an HTTP based web test plan but we have not set any timeout explicitly > (must be defaults if any).

By default, we wait infinitely which can be a cause for hanging. So please try setting in Advanced tab connect (500 is an acceptable value) and read (30000) timeouts and see if it hangs.

> For more details, the test plan is attached in > the bug. > > The JMeter distributed setup is running on jre-headless docker image so it > lacks jdk utilities. I'm facing two issues while getting thread dump: > > 1. The issue is occurring inconsistently and randomly.

You can probably install a jdk instead and you'll have it; > > 2. While trying to run jmeter threaddump.sh, this only works on JMeter > client (attached thread dump) but not on JMeter servers. Is there a way to > run it and get thread dump on JMeter server?

Install a JDK on your docker image Connect to it using bash and run jstack from inside the image

> > Thanks! > > (In reply to Philippe Mouawad from comment 1) > > Hello, > > If it's an HTTP based load test, have you set connect and read timeout on > > HTTP Requests ? > > > > If it's another type of test, check that you don't have hanging samplers. > > > > If you identify the hanging node, then run a thread dump using jstack or > > jmeter threaddump.sh and attach output here. > > > > Thanks

asfimport commented 3 years ago

Madhuri Jain (migrated from Bugzilla): (In reply to Madhuri Jain from comment 6) To add more details: I'm issuing the command as root user and the process is also running by root.

whoami

root

ps

PID USER TIME COMMAND 54 root 0:00 {jmeter.sh} /bin/sh /jmeter/apache-jmeter-5.4/bin/jmeter.sh -s -Jbeanshell.server.port=9000 70 root 0:00 {jmeter} /bin/sh /jmeter/apache-jmeter-5.4/bin/jmeter -s -Jbeanshell.server.port=9000

(Though I'm bit confused why it shows 2 jmeter processes)

(In reply to Philippe Mouawad from comment 5) Hello, Thank you for the useful insight and apologies for delay.

I had been trying to reproduce the issue, but seems like it's not getting reproduced again. I'd get back with the details as soon as this gets reproduced on the setup.

To dig further, can I please get some more details on the following:

  1. Timeouts - Is there a recommended way to set default timeouts and control this behavior for all test plans run on a JMeter setup (distributed), through properties like httpclient.timeout and would it suffice?

  2. Thread dump - I've changed the image to mcr.microsoft.com/java/jdk:15-zulu-alpine (Link: https://hub.docker.com/_/microsoft-java-jdk)

Here're the errors I'm getting with different commands:

jstack -l -e 71 71: Unable to open socket file /proc/71/root/tmp/.java_pid71: target process 71 doesn't respond within Bug 10500ms or HotSpot VM not loaded

jstack -l -e -F 71 Error: -F option used Cannot connect to core dump or remote debug server. Use jhsdb jstack instead

jcmd 69 Thread.print 69: com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /proc/69/root/tmp/.java_pid69: target process 69 doesn't respond within 10500ms or HotSpot VM not loaded at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl. java:103) at jdk.attach/sun.tools.attach.AttachProviderImpl. attachVirtualMachine(AttachProviderImpl.java:58) at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java: 207) at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)

Even kill -3 <pid> is not working.

Thanks again for all the help!

> (In reply to Madhuri Jain from comment 4) > > Hi, > > Thank you very much for your response. > > > > It's an HTTP based web test plan but we have not set any timeout explicitly > > (must be defaults if any). > > By default, we wait infinitely which can be a cause for hanging. > So please try setting in Advanced tab connect (500 is an acceptable value) > and read (30000) timeouts and see if it hangs. > > > For more details, the test plan is attached in > > the bug. > > > > The JMeter distributed setup is running on jre-headless docker image so it > > lacks jdk utilities. I'm facing two issues while getting thread dump: > > > > 1. The issue is occurring inconsistently and randomly. > > You can probably install a jdk instead and you'll have it; > > > > 2. While trying to run jmeter threaddump.sh, this only works on JMeter > > client (attached thread dump) but not on JMeter servers. Is there a way to > > run it and get thread dump on JMeter server? > > Install a JDK on your docker image > Connect to it using bash and run jstack from inside the image > > > > > Thanks! > > > > (In reply to Philippe Mouawad from comment 1) > > > Hello, > > > If it's an HTTP based load test, have you set connect and read timeout on > > > HTTP Requests ? > > > > > > If it's another type of test, check that you don't have hanging samplers. > > > > > > If you identify the hanging node, then run a thread dump using jstack or > > > jmeter threaddump.sh and attach output here. > > > > > > Thanks

asfimport commented 3 years ago

Madhuri Jain (migrated from Bugzilla): (In reply to Madhuri Jain from comment 7)

The issue of thread dump is resolved by using pid from jps.

jps -l

128 /jmeter/apache-jmeter-5.4/bin/ApacheJMeter.jar

jcmd 128 Thread.print

WORKED

I'm trying to reproduce the issue and will provide the required thread dump once it occurs.

Thanks!

(In reply to Madhuri Jain from comment 6) To add more details: I'm issuing the command as root user and the process is also running by root.

whoami

root

ps

PID USER TIME COMMAND 54 root 0:00 {jmeter.sh} /bin/sh /jmeter/apache-jmeter-5.4/bin/jmeter.sh -s -Jbeanshell.server.port=9000 70 root 0:00 {jmeter} /bin/sh /jmeter/apache-jmeter-5.4/bin/jmeter -s -Jbeanshell.server.port=9000

(Though I'm bit confused why it shows 2 jmeter processes)

> (In reply to Philippe Mouawad from comment 5) > Hello, > Thank you for the useful insight and apologies for delay. > > I had been trying to reproduce the issue, but seems like it's not getting > reproduced again. I'd get back with the details as soon as this gets > reproduced on the setup. > > To dig further, can I please get some more details on the following: > > 1. Timeouts - Is there a recommended way to set default timeouts and control > this behavior for all test plans run on a JMeter setup (distributed), > through properties like httpclient.timeout and would it suffice? > > 2. Thread dump - I've changed the image to > mcr.microsoft.com/java/jdk:15-zulu-alpine > (Link: https://hub.docker.com/_/microsoft-java-jdk) > > Here're the errors I'm getting with different commands: > > jstack -l -e 71 > 71: Unable to open socket file /proc/71/root/tmp/.java_pid71: target process > 71 doesn't respond within Bug 10500ms or HotSpot VM not loaded > > jstack -l -e -F 71 > Error: -F option used > Cannot connect to core dump or remote debug server. Use jhsdb jstack instead > > jcmd 69 Thread.print > 69: > com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file > /proc/69/root/tmp/.java_pid69: target process 69 doesn't respond within > 10500ms or HotSpot VM not loaded > at > jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl. > java:103) > at > jdk.attach/sun.tools.attach.AttachProviderImpl. > attachVirtualMachine(AttachProviderImpl.java:58) > at > jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java: > 207) > at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) > at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97) > > Even kill -3 <pid> is not working. > > Thanks again for all the help! > > > (In reply to Madhuri Jain from comment 4) > > > Hi, > > > Thank you very much for your response. > > > > > > It's an HTTP based web test plan but we have not set any timeout explicitly > > > (must be defaults if any). > > > > By default, we wait infinitely which can be a cause for hanging. > > So please try setting in Advanced tab connect (500 is an acceptable value) > > and read (30000) timeouts and see if it hangs. > > > > > For more details, the test plan is attached in > > > the bug. > > > > > > The JMeter distributed setup is running on jre-headless docker image so it > > > lacks jdk utilities. I'm facing two issues while getting thread dump: > > > > > > 1. The issue is occurring inconsistently and randomly. > > > > You can probably install a jdk instead and you'll have it; > > > > > > 2. While trying to run jmeter threaddump.sh, this only works on JMeter > > > client (attached thread dump) but not on JMeter servers. Is there a way to > > > run it and get thread dump on JMeter server? > > > > Install a JDK on your docker image > > Connect to it using bash and run jstack from inside the image > > > > > > > > Thanks! > > > > > > (In reply to Philippe Mouawad from comment 1) > > > > Hello, > > > > If it's an HTTP based load test, have you set connect and read timeout on > > > > HTTP Requests ? > > > > > > > > If it's another type of test, check that you don't have hanging samplers. > > > > > > > > If you identify the hanging node, then run a thread dump using jstack or > > > > jmeter threaddump.sh and attach output here. > > > > > > > > Thanks

asfimport commented 3 years ago

Madhuri Jain (migrated from Bugzilla): Attached the required logs and thread dumps where controller corresponds to JMeter client and worker to JMeter server in distributed setup.

Thanks, Madhuri

Created attachment debug_data.zip: Contains test plan, jmeter logs and thread dumps

asfimport commented 3 years ago

@pmouawad (migrated from Bugzilla): Hello, Looking at Thread dump of worker, I can see the following stack which seems to confirm what I previously wrote, ie that there is a hanging HTTP request:


"10.244.32.13-Thread Group 1-14" #46 daemon prio=5 os_prio=0 cpu=30.54ms elapsed=1299.76s tid=0x00005606165e4cc0 nid=0x81 runnable [0x00007f7b64fbc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.SocketDispatcher.read0(java.base@15.0.1/Native Method) at sun.nio.ch.SocketDispatcher.read(java.base@15.0.1/Unknown Source) at sun.nio.ch.NioSocketImpl.tryRead(java.base@15.0.1/Unknown Source) at sun.nio.ch.NioSocketImpl.implRead(java.base@15.0.1/Unknown Source) at sun.nio.ch.NioSocketImpl.read(java.base@15.0.1/Unknown Source) at sun.nio.ch.NioSocketImpl$1.read(java.base@15.0.1/Unknown Source) at java.net.Socket$SocketInputStream.read(java.base@15.0.1/Unknown Source) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.executeRequest(HTTPHC4Impl.java:930) at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.sample(HTTPHC4Impl.java:641) at org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy.sample(HTTPSamplerProxy.java:66) at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1281) at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1270) at org.apache.jmeter.threads.JMeterThread.doSampling(JMeterThread.java:630) at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:558) at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:489) at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:256) at java.lang.Thread.run(java.base@15.0.1/Unknown Source)


Did you try setting a Connect and Read timeout by adding an HTTP Request Defaults and in advanced tab setting those 2 values ?

Thank you

asfimport commented 3 years ago

Madhuri Jain (migrated from Bugzilla): Hello, Thank you for the information.

Yes, I had tried with timeouts and the issue didn't reproduce yet. I'll try for a couple of days more and will revert if that re-occurrs.

In the last test run I shared, I noticed that there's a peak response time towards the end after which the test finally ended (still surpassing the specified test duration). I've added snapshots of the graphs to confirm.

While I've also added details for another test run where the test NEVER ended itself. There seems to be around 102 active users but without any server interactions. Does the HTTP thread hang applies to this case as well?

Also, we're not the test authors, hence we can not confirm if the test plan contains timeout specs or no.

Could you please help with any platform recommendation to control this behavior (like some setting or property etc.)?

Thanks!

Created attachment debug_data_2.zip: Latest_run_data