Open xlight05 opened 2 months ago
Client(PostMan) <-----> Service(Ballerina service deployed in choreo) <-----> Backend (2 - openai and anthropic)
Find the summary of the trace logs:
Last event write:
[2024-09-03 06:07:18,686] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] OUTBOUND: DefaultHttpContent(data: UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeHeapByteBuf(ridx: 0, widx: 268, cap: 8192), decoderResult: success), 268B
time=2024-09-03T06:07:18.685Z level=INFO module=wso2/ballerina_copilot message="Time taken to get the functions: 2.378138905"
[2024-09-03 06:07:18,686] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] FLUSH
time=2024-09-03T06:07:18.691Z level=INFO module=wso2/ballerina_copilot message="Sending a request to Claude to generate the code"
Connection close:
[2024-09-03 06:08:37,497] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] INACTIVE
[2024-09-03 06:08:37,497] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] READ COMPLETE
[2024-09-03 06:08:37,498] TRACE {http.tracelog.downstream} - [id: 0x2d057e82, correlatedSource: n/a, host:/10.100.1.153:9094 - remote:/10.100.1.146:54862] UNREGISTERED
We attempted to reproduce this scenario in a Docker environment with the following configuration:
[cloud.deployment]
min_memory="100Mi" # Minimum memory required for the container.
max_memory="100Mi" # Maximum memory a single container can take.
min_cpu="500m" # Minimum CPU required for the container.
max_cpu="500m" # Maximum CPU a single container can take.
We tested the following scenarios:
In both scenarios, the connection between the client (Postman) and the service was maintained as HTTP/1.1, while the connection between the Ballerina client and the backends was HTTP/2.0.
However, the hang issue was not observed in either scenario.
Additionally, based on @xlight05's research, HTTP/2.0 is not fully supported by Anthropic, and it is recommended to use HTTP/1.1 instead. @xlight05 also noted that this hang issue occurs in the Choreo environment. As the next step, we could attempt to mirror the Choreo environment locally and try to reproduce the issue.
Description: $Subject. Only happens in http2.
Service is not fully hung. each request gets served until the point of http client creation. Service works for sometime and this hanging behavior happens intermittently.
I have the tracelog I took after the service is hung and it can be shared on request.
Steps to reproduce:
Affected Versions:
OS, DB, other environment details and versions:
Related Issues (optional):
Suggested Labels (optional):
Suggested Assignees (optional):