awslabs / aws-crt-python

Python bindings for the AWS Common Runtime
Apache License 2.0
87 stars 43 forks source link

Random Crash with awsiotsdk 1.11.9 and awscrt 0.14.7 #400

Closed liugongqx closed 1 year ago

liugongqx commented 2 years ago

Hi,

This is pretty similar to this issue.

Error message: 2022-10-07T23:46:20.512Z [WARN] (Copier) com.amazon.ControlComponent: stderr. Fatal error condition occurred in /aws-crt-python/crt/aws-c-event-stream/source/event_stream_rpc_client.c:961: ref_count != 0 && "Continuation ref count has gone negative". {scriptName=services.com.amazon.ControlComponent.lifecycle.Run.Script, serviceName=com.amazon.ControlComponent, currentState=RUNNING}

In our use case, there is a control component keeps publishing message to topic for capturing data from capture component in high freq(100ms). This thread will be blocked to listen to the response back. Just to notice this is a task thread instead of main thread. This issue happens randomly. When this error shows up, control component stops publishing message to topic including "STOP". So capture component will keep running till disk is full.

A little more background, this error shows up after I upgraded awsiotsdk. Before that, we got this error which caused similar result.

2022-10-01T00:11:22.715Z INFO:awsiot.eventstreamrpc:<Connection at 0x7fac4e438d00 /greengrass/v2/ipc.socket:0> disconnected, reason: EventStreamError(<MessageType.PROTOCOL_ERROR: 6>, [Header(':content-type', 'application/json', <HeaderType.STRING: 7>), Header(':message-type', 6, <HeaderType.INT32: 4>), Header(':message-flags', 0, <HeaderType.INT32: 4>), Header(':stream-id', 0, <HeaderType.INT32: 4>)], b'{ "message": "stream-id values must be monotonically incrementing. A stream-id arrived that was lower than the last seen stream-id."; }')

liugongqx commented 2 years ago

Any update here? Thank you!

sbSteveK commented 1 year ago

Can you check if this error still occurs with the latest version of awsiotsdk and awscrt? There have been a number of fixes to event stream since October 2022.

Please provide surrounding logs, your os/version/env and any code snippets you feel may be relevant.

You mentioned that there are threads being blocked to listen for a response. Please elaborate on this if you can and be aware that blocking any callback can cause deadlocks and stop the crt/sdk from operating properly. You can read about it a bit here: https://awslabs.github.io/aws-crt-python/api/websocket.html#authoring-callbacks