ballerina-platform / ballerina-library

The Ballerina Library
https://ballerina.io/learn/api-docs/ballerina/
Apache License 2.0
136 stars 64 forks source link

Http service hangs on SSE Client call intermittently #6956

Open xlight05 opened 2 months ago

xlight05 commented 2 months ago

Description: $Subject. Only happens in http2.

Service is not fully hung. each request gets served until the point of http client creation. Service works for sometime and this hanging behavior happens intermittently.

I have the tracelog I took after the service is hung and it can be shared on request.

Steps to reproduce:

Affected Versions:

OS, DB, other environment details and versions:

Related Issues (optional):

Suggested Labels (optional):

Suggested Assignees (optional):

TharmiganK commented 1 month ago

Client(PostMan) <-----> Service(Ballerina service deployed in choreo) <-----> Backend (2 - openai and anthropic)

Find the summary of the trace logs:

Last event write:

[2024-09-03 06:07:18,686] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] OUTBOUND: DefaultHttpContent(data: UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeHeapByteBuf(ridx: 0, widx: 268, cap: 8192), decoderResult: success), 268B
time=2024-09-03T06:07:18.685Z level=INFO module=wso2/ballerina_copilot message="Time taken to get the functions: 2.378138905"
[2024-09-03 06:07:18,686] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] FLUSH  
time=2024-09-03T06:07:18.691Z level=INFO module=wso2/ballerina_copilot message="Sending a request to Claude to generate the code"

Connection close:

[2024-09-03 06:08:37,497] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] INACTIVE  
[2024-09-03 06:08:37,497] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] READ COMPLETE  
[2024-09-03 06:08:37,498] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] UNREGISTERED
MohamedSabthar commented 1 month ago

We attempted to reproduce this scenario in a Docker environment with the following configuration:

[cloud.deployment]
min_memory="100Mi" # Minimum memory required for the container.
max_memory="100Mi" # Maximum memory a single container can take.
min_cpu="500m"  # Minimum CPU required for the container.
max_cpu="500m" # Maximum CPU a single container can take.

We tested the following scenarios:

  1. Connect to the Anthropic backend using the Ballerina client and passthrough the response via a Ballerina service.
  2. Replace the Anthropic backend with a mock backend and passthrough the response via a Ballerina service.

In both scenarios, the connection between the client (Postman) and the service was maintained as HTTP/1.1, while the connection between the Ballerina client and the backends was HTTP/2.0.

However, the hang issue was not observed in either scenario.

Additionally, based on @xlight05's research, HTTP/2.0 is not fully supported by Anthropic, and it is recommended to use HTTP/1.1 instead. @xlight05 also noted that this hang issue occurs in the Choreo environment. As the next step, we could attempt to mirror the Choreo environment locally and try to reproduce the issue.