awslabs / amazon-kinesis-video-streams-pic

Apache License 2.0
47 stars 47 forks source link

[QUESTION] Stream state machine exits when 0x15000011 occurs #113

Closed Nomidia closed 3 years ago

Nomidia commented 3 years ago

When there is something wrong with the state machine, continuousRetryStreamErrorReportHandler will be called. It will check whether the statusCode meets the reconnection conditions by IS_RETRIABLE_ERROR. If the check passes, it will reset the stream.

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/e1dd9364c3860f0196cc24a5b815c82076ab8149/src/source/ContinuousRetryStreamCallbacks.c#L284

STATUS continuousRetryStreamErrorReportHandler(UINT64 customData, STREAM_HANDLE streamHandle,
                                               UPLOAD_HANDLE uploadHandle, UINT64 erroredTimecode,
                                               STATUS statusCode)
{
    UNUSED_PARAM(uploadHandle);
    UNUSED_PARAM(customData);
    STATUS retStatus = STATUS_SUCCESS;
    TID threadId;
    DLOGW("Reporting stream error. Errored timecode: %" PRIu64 " Status: 0x%08x", erroredTimecode, statusCode);

    // return success if the sdk can recover from the error
    CHK(!IS_RECOVERABLE_ERROR(statusCode), retStatus);
    CHK(IS_RETRIABLE_ERROR(statusCode), retStatus);

    // Run the reset in a separate thread
    CHK_STATUS(THREAD_CREATE(&threadId, continuousRetryStreamRestartHandler, (PVOID) streamHandle));
    CHK_STATUS(THREAD_DETACH(threadId));

CleanUp:
    return retStatus;
}

https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/3f285bcc2d741f4c53171f7467b85358ca294a40/src/client/include/com/amazonaws/kinesis/video/client/Include.h#L210-L215

But when statusCode is 0x15000011, the state machine will exit. I think this error code should also be allowed to reconnect.

  1. In this case, the describStream returns 52000011
  2. ResetStream is called
  3. In client state machine, expiration is reached, attempt to get a new expiration time, but timed out. return 0x15000011
  4. Check IS_RETRIABLE_ERROR failed, exit the stream state machine.

a. Normal

16:35:30  2020-12-29 08:35:30 WARN    stepStateMachine(): pStateMachine=c8764c,start. current state=2,next state=2
16:35:30  2020-12-29 08:35:30 WARN    stepStateMachine(): pStateMachine=c8764c,current state=2,next state=2,pState->retry=5,retryCound=5
16:35:30  2020-12-29 08:35:30 WARN    stepStateMachine(): pStateMachine=c8764c,end. retStatus=52000011.
16:35:30  2020-12-29 08:35:30 ERROR   describeStreamResultEvent(): operation returned status code: 0x52000011
16:35:30  2020-12-29 08:35:30 WARN    continuousRetryStreamErrorReportHandler(): Reporting stream error. Errored timecode: 0 Status: 0x52000011
16:35:30  2020-12-29 08:35:30 WARN    resetStream(): resetStream:   call contentViewRemoveAll. 

b. Abnormal

16:35:43  2020-12-29 08:35:43 ERROR   blockingCurlCall(): Curl perform failed for url https://c3qp4tl980s52m.credentials.iot.ap-south-1.amazonaws.com/role-aliases/dev-kvs-access-role-alias/credentials with result Timeout was reached : Resolving timed out after 3587 milliseconds 
16:35:43  2020-12-29 08:35:43 WARN    stepStateMachine(): pStateMachine=d08644,end. retStatus=15000011.
16:35:43  2020-12-29 08:35:43 WARN    executeDescribeStreamState(): ---- Leave retStatus=15000011.
16:35:43  
16:35:44  2020-12-29 08:35:43 WARN    stepStateMachine(): pStateMachine=c8764c,end. retStatus=15000011.
16:35:44  2020-12-29 08:35:43 ERROR   describeStreamResultEvent(): operation returned status code: 0x15000011
16:35:44  2020-12-29 08:35:43 WARN    continuousRetryStreamErrorReportHandler(): Reporting stream error. Errored timecode: 0 Status: 0x15000011
MushMal commented 3 years ago

Need to think a little longer on this one.

0x15000011 is STATUS_IOT_FAILED which is very generic and will be returned on issues like bad params as well as network call failures.

Do you want to take a stab at it and send a PR?

Nomidia commented 3 years ago

I'm afraid I will miss something, I just simply add modified my producer-c like this.

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/include/com/amazonaws/kinesis/video/common/Include.h#L52

#define STATUS_CURL_OPERATION_TIMEDOUT                                              STATUS_COMMON_PRODUCER_BASE + 0x00000025

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/source/Common/Curl/CurlCall.c#L79

    if (res != CURLE_OK) {
        curl_easy_getinfo(curl, CURLINFO_EFFECTIVE_URL, &url);
        DLOGE("Curl perform failed for url %s with result %s : %s ", url, curl_easy_strerror(res), errorBuffer);
        if (res == CURLE_OPERATION_TIMEDOUT) {
            CHK(FALSE, STATUS_CURL_OPERATION_TIMEDOUT);
        }
        CHK(FALSE, STATUS_IOT_FAILED);
    }

https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/blob/7923ac0f939af86e60cbbba162d96282761ae7ff/src/source/ContinuousRetryStreamCallbacks.c#L296

    CHK(IS_RETRIABLE_ERROR(statusCode) || (statusCode == STATUS_CURL_OPERATION_TIMEDOUT), retStatus);

In addition, the error code is defined repeatedly. image image

Nomidia commented 3 years ago

I will open it in producer-c.