awslabs / amazon-kinesis-video-streams-producer-c

https://awslabs.github.io/amazon-kinesis-video-streams-producer-c/group__PublicMemberFunctions.html
Apache License 2.0
54 stars 72 forks source link

[BUG] Stream state machine exits when 0x15000011 #169

Closed Nomidia closed 3 years ago

Nomidia commented 3 years ago

Describe the bug When there is something wrong with the state machine, continuousRetryStreamErrorReportHandler will be called. It will check whether the statusCode meets the reconnection conditions by IS_RETRIABLE_ERROR. If the check passes, it will reset the stream. In some scenarios, when state machine returns 0x15000011, actually it does not happend.

Scenario 1:

  1. The describStream returns 52000011
  2. ResetStream is called
  3. In client state machine, expiration is reached, attempt to get a new expiration time, but timed out. return 0x15000011
  4. Check IS_RETRIABLE_ERROR failed, exit the stream state machine. https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/e1dd9364c3860f0196cc24a5b815c82076ab8149/src/source/ContinuousRetryStreamCallbacks.c#L296
16:35:43  2020-12-29 08:35:43 ERROR   blockingCurlCall(): Curl perform failed for url https://c3qp4tl980s52m.credentials.iot.ap-south-1.amazonaws.com/role-aliases/dev-kvs-access-role-alias/credentials with result Timeout was reached : Resolving timed out after 3587 milliseconds 
16:35:43  2020-12-29 08:35:43 WARN    stepStateMachine(): pStateMachine=d08644,end. retStatus=15000011.
16:35:43  2020-12-29 08:35:43 WARN    executeDescribeStreamState(): ---- Leave retStatus=15000011.
16:35:43  
16:35:44  2020-12-29 08:35:43 WARN    stepStateMachine(): pStateMachine=c8764c,end. retStatus=15000011.
16:35:44  2020-12-29 08:35:43 ERROR   describeStreamResultEvent(): operation returned status code: 0x15000011
16:35:44  2020-12-29 08:35:43 WARN    continuousRetryStreamErrorReportHandler(): Reporting stream error. Errored timecode: 0 Status: 0x15000011

Scenario 2:

  1. curl perform failed for url https://s-703775db.kinesisvideo.ap-south-1.amazonaws.com/putMedia
  2. stepStateMachine in kinesisVideoStreamTerminated
  3. return 0x15000011, but it does not get this error code.
  4. Skip retry in continuousRetryStreamErrorReportHandler

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/e1dd9364c3860f0196cc24a5b815c82076ab8149/src/source/CurlApiCallbacks.c#L2231-L2240

9:18:12  2021-01-06 11:18:13 WARN    curlCompleteSync(): curl perform failed for url https://s-703775db.kinesisvideo.ap-south-1.amazonaws.com/putMedia with result Timeout was reached: Resolving timed out after 5599 milliseconds
19:18:12  2021-01-06 11:18:13 WARN    curlCompleteSync(): pCurlResponse->callInfo.httpStatus=0
19:18:12  2021-01-06 11:18:13 WARN    curlCompleteSync(): HTTP Error 0 : Response: (null)

19:18:16  2021-01-06 11:18:17 ERROR   blockingCurlCall(): Curl perform failed for url https://c3qp4tl980s52m.credentials.iot.ap-south-1.amazonaws.com/role-aliases/dev-kvs-access-role-alias/credentials with result Timeout was reached : Resolving timed out after 3575 milliseconds 
19:18:16  2021-01-06 11:18:17 WARN    stepStateMachine(): pStateMachine=d88c64,end. retStatus=15000011.
19:18:16  2021-01-06 11:18:17 WARN    executePutStreamState(): ---- Leave retStatus=15000011.
19:18:16  
19:18:16  2021-01-06 11:18:17 WARN    stepStateMachine(): pStateMachine=d5931c,end. retStatus=15000011.
19:18:16  2021-01-06 11:18:17 WARN    executeReadyStreamState(): ---- Leave retStatus=15000011.
19:18:16  
19:18:16  2021-01-06 11:18:17 WARN    stepStateMachine(): pStateMachine=d5931c,end. retStatus=15000011.
19:18:16  2021-01-06 11:18:17 WARN    executeStoppedStreamState(): ---- Leave retStatus=15000011.
19:18:16  
19:18:16  2021-01-06 11:18:17 WARN    stepStateMachine(): pStateMachine=d5931c,end. retStatus=15000011.
19:18:16  2021-01-06 11:18:17 ERROR   kinesisVideoStreamTerminated(): operation returned status code: 0x15000011

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ All of the 0x15000011 is caused by Timeout. In this case, I think it should retry. I simply modified my locak code like this.

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/include/com/amazonaws/kinesis/video/common/Include.h#L52

#define STATUS_CURL_OPERATION_TIMEDOUT                                              STATUS_COMMON_PRODUCER_BASE + 0x00000025

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/source/Common/Curl/CurlCall.c#L79-L83

    if (res != CURLE_OK) {
        curl_easy_getinfo(curl, CURLINFO_EFFECTIVE_URL, &url);
        DLOGE("Curl perform failed for url %s with result %s : %s ", url, curl_easy_strerror(res), errorBuffer);
        if (res == CURLE_OPERATION_TIMEDOUT) {
            CHK(FALSE, STATUS_CURL_OPERATION_TIMEDOUT);
        }
        CHK(FALSE, STATUS_IOT_FAILED);
    }

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/source/ContinuousRetryStreamCallbacks.c#L296

CHK(IS_RETRIABLE_ERROR(statusCode) || (statusCode == STATUS_CURL_OPERATION_TIMEDOUT), retStatus);

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/source/CurlApiCallbacks.c#L2235

            retStatus = kinesisVideoStreamTerminated(streamHandle, uploadHandle, callResult);

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ In addition, there are mutiple upload handle at the same time, I'm not sure if it is a issue. image

And error codes are defined repeatedly. https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/include/com/amazonaws/kinesis/video/common/Include.h#L47-L48

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/include/com/amazonaws/kinesis/video/cproducer/Include.h#L43-L44

Nomidia commented 3 years ago

Senario 3:

  1. Disconnect the router network.
  2. kinesisVideoStreamTerminated return 0x15000011
  3. continuousRetryStreamErrorReportHandler is called
  4. kinesisVideoStreamResetStream returns 0x15000011
  5. Recover the network, stream state machine did not recover
MushMal commented 3 years ago

Oh wow. OK, status codes need to be fixed ASAP.

We need to make this error re-triable. Will provide a fix as soon as possible

disa6302 commented 3 years ago

Closing since PR is merged.