Closed manuranga closed 2 months ago
In my machine issues doesn't happen in 2201.4.3 But issue exists in 2201.5.0
The JMX script using a stepping thread group to identify the maximum concurrent threads which can be handled by the server without a timeout.
There seems to be some interruption errors in the jmeter client side. Will get a thread dump for further analysis
The issue here is the jmeter plugin client's behaviour. I saw that regardless of HTTP2Sampler.http1_upgrade
property the client is sending a mixed of HTTP/1.1 requests and HTTP/1.1 upgrade requests(There is no HTTP/2 requests)(There is also a known issue in this plugin). This was verified by wireshark.
In the above screenshot, the very first request is coming as a HTTP/1.1 upgrade but the next request is a normal HTTP/1.1 request. Also there is a unexpected GO_AWAY
from the client to close the established HTTP/2 connection. This is when we make HTTP2Sampler.http1_upgrade
property true
. When we set it to false, the very first request is normal HTTP/1.1 and the next request is HTTP/1.1 Upgrade. (I used a single user to check this)
Due to these factors, the number of established connections are keep increasing since the ballerina server side does not limit the connections made by the client. So, there is an interruption exception occurred in the client used in the jmeter plugin.
As the core jmeter does not have a HTTP/2 client, it is not advisable to trust the results of these plugins without verifying its behaviour. We recommend to use h2load
as a bench marking tool for HTTP/2. You can do a similar stress test using the following script:
#!/bin/bash
URL="http://localhost:8080/hello"
TOTAL_REQUESTS=100000
INITIAL_CONCURRENCY=1
MAX_CONCURRENCY=200
STEP=1
for ((c=$INITIAL_CONCURRENCY; c<=$MAX_CONCURRENCY; c+=$STEP))
do
echo "Running with concurrency: $c"
h2load -n $TOTAL_REQUESTS -c $c $URL
sleep 2 # Pause between steps, adjust as necessary
done
Keeping this issue open, to check the server's behaviour when we get an unexpected GO_AWAY
from the client. @dilanSachi is checking on this
From my observation, this GO_AWAY comes after 20s. Looks like the client is just holding and closing it after a set timeout.
Keeping this issue open, to check the server's behaviour when we get an unexpected
GO_AWAY
from the client. @dilanSachi is checking on this
At the moment, ballerina service does not act to the GO_AWAY
from the client. The connection is kept open until client initiates a connection close. This can be seen in the following server trace logs.
[2024-07-03 15:06:02,057] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, correlatedSource: n/a, host:/127.0.0.1:8080 - remote:/127.0.0.1:37470] ACTIVE
[2024-07-03 15:06:02,110] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND SETTINGS: ack=false settings={INITIAL_WINDOW_SIZE=65535, MAX_HEADER_LIST_SIZE=8192}
[2024-07-03 15:06:02,113] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, correlatedSource: n/a, host:/127.0.0.1:8080 - remote:/127.0.0.1:37470] OUTBOUND: << Payload could not be decoded >>
[2024-07-03 15:06:02,120] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=25700}
[2024-07-03 15:06:02,122] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND SETTINGS: ack=true
[2024-07-03 15:06:02,928] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND SETTINGS: ack=true
[2024-07-03 15:06:02,934] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND HEADERS: streamId=3 headers=DefaultHttp2Headers[:path: /hello, :method: POST, content-type: text/plain, user-agent: wso2] padding=0 endStream=false
[2024-07-03 15:06:03,929] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND DATA: streamId=3 padding=0 endStream=true length=11 data=11B
hellosworld
Payload received hellosworld
1
[2024-07-03 15:06:03,974] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND HEADERS: streamId=3 headers=DefaultHttp2Headers[:status: 201, content-type: application/json, server: ballerina, date: Wed, 3 Jul 2024 15:06:03 +0530] padding=0 endStream=false
[2024-07-03 15:06:03,980] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND DATA: streamId=3 padding=0 endStream=true length=17 data=17B
{"hello":"world"}
[2024-07-03 15:06:05,930] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND GO_AWAY: lastStreamId=3 errorCode=11 length=0 bytes=
However, this can cause issues with malformed clients. @TharmiganK observed something similar to this with the above client. Here, the client opens a connection and do some communication and send a GO_AWAY
, but does not close the connection. Then the client opens another connection and do some comm and send a GO_AWAY
again. If this happens repeatedly, there will be too many open connections that cannot do any communication and this is a wastage. So we offline discussed to add a feature for Ballerina http2 server that would act upon receiving GO_AWAY
frames. The proposed approach is to wait for sometime until the client closes the connection/wait until the existing streams complete the communication and close the connection from the server side.
Improved the behaviour by handling the GO_AWAY
from server side with a stale eviction implementation
Description: HTTP resources fail at a lower concurrency than expected. It fails around 30 threads on my laptop, while similar java application can tolerate more than 200 (could be even higher).
Steps to reproduce: create a simple service
Stress test with JMeter. test-bal-threads.txt (rename .txt to .jmx) Affected Versions: Ballerina 2201.9.1
Related Issues (optional): https://discord.com/channels/957996897782616114/1017493204326686860/threads/1255528309769506969