Closed xlight05 closed 2 weeks ago
Tried running the existing load tests with 200 concurrent requests using the workflow.
Workflow run: https://github.com/ballerina-platform/ballerina-performance-cloud/actions/runs/8794152427 Results: https://github.com/ballerina-platform/module-ballerina-http/pull/1964/files
There were some significant differences between the code used in the https_passthrough load-test and the code used in the h1_h1_passthrough load-test.
ballerina-performance-cloud
uses a h2-h2
approach where as the one in the http
module uses h1-h1
ballerina-performance-cloud
uses http:Caller
to respond where as the one in the http
module just return the http:Response
So I tried by adding two more load-tests: h2_h2_passthrough
and h2_transformation
but still getting 0% error rate:
Workflow run: https://github.com/ballerina-platform/ballerina-performance-cloud/actions/runs/8797283914 Results: https://github.com/ballerina-platform/module-ballerina-http/pull/1963/files
I have tried to reproduce this issue locally using the the code in the https_passthrough and running load-test with 200 concurrent users for 5 minutes. But I could not reproduce the issue.
@xlight05 Can we check on the configurations used to run this load-tests?
Had an offline chat on this. We were able to get a stand dump when this issue was reproduced.
Strand dump - https://gist.github.com/xlight05/9ef16bbe1ea7f733d43a398429920a32
I was able to reproduce this issue with the help of @xlight05 in a constraint environment. Please find the below steps:
bal build
docker-compose up
Please note that this issue is only reproducible when you make multiple requests at a small interval. Strangely, if we make only one request at first and wait for the response then the subsequent requests are passing.
I have checked the following:
So it seems the issue is coming from lang with update 9 changes. Adding @HindujaB @gabilang to check on this
I was able to reduce the reproducer code with this: (no need for docker, just use bal run
)
import ballerina/http;
listener http:Listener securedEP = new (9090);
final http:Client nettyEP = check new ("http://localhost:8688");
service /passthrough on securedEP {
resource function post .(http:Request clientRequest) returns http:Response|error {
return nettyEP->/'service/EchoService.post(clientRequest);
}
}
But in order to reproduce, I have to use 1000 users with 5s ramp-up period. (I checked the similar configuration with update 8 service and it was working without any hanging.)
If I remove the clientRequest
from the resource signature then it is working without any hanging issue. So this might be related to the previous memory issue: https://github.com/ballerina-platform/ballerina-lang/issues/42566. The difference here is there is no memory increase now but some strands used to populate the default values seems to be in runnable state.
Please note that I have removed SSL here, so not 100% sure that both of these are related. (With SSL also the service is hanging). But I think with SSL, the probability of this issue occurrence is high.
When hanging most of the jbal threads are in monitor
state:
Strand dump: https://gist.github.com/TharmiganK/932d0274a391aa55f8fbe9e9da5135a1 Thread dump: https://drive.google.com/file/d/14Y4x7b5Vdm-8RCT_VyLTQSC8sSsqSfbT/view?usp=drive_link
This issue is NOT closed with a proper Reason/ label. Make sure to add proper reason label before closing. Please add or leave a comment with the proper reason label now.
- Reason/EngineeringMistake - The issue occurred due to a mistake made in the past.
- Reason/Regression - The issue has introduced a regression.
- Reason/MultipleComponentInteraction - Issue occured due to interactions in multiple components.
- Reason/Complex - Issue occurred due to complex scenario.
- Reason/Invalid - Issue is invalid.
- Reason/Other - None of the above cases.
Description: $Subject. This works fine for the 60 user case. Applies to both passthrough and transformation usecase we have.
https://github.com/ballerina-platform/ballerina-performance-cloud/actions/runs/8782386068/job/24096556961 https://github.com/ballerina-platform/ballerina-performance-cloud/blob/1d735a28d62a06265a7ccad3c72bfa78764b476a/load-tests/https_passthrough/results/summary.csv#L2609
Steps to reproduce:
Affected Versions:
OS, DB, other environment details and versions:
Related Issues (optional):
Suggested Labels (optional):
Suggested Assignees (optional):