aws-samples / amazon-transcribe-live-call-analytics

Amazon Transcribe Live Call Analytics (LCA) Sample Solution
Other
81 stars 54 forks source link

Urgent issue- Timed out waiting for opened response #156

Open rajapradeep opened 1 month ago

rajapradeep commented 1 month ago

Hi LCA Team,

I have the same issue which was indicated in https://github.com/aws-samples/amazon-transcribe-live-call-analytics/issues/94 & https://github.com/aws-samples/amazon-transcribe-live-call-analytics/issues/95. Its not resolved in any of the tickets. With LCA team support or himself has made some changes to his LCA setup & it was fixed. Can anyone share the setup changes & how this can be fixed.

Error:

| 2024-07-20T21:15:33.379Z | [21:15:32.979] INFO: onErrorMessage - {   | 2024-07-20T21:15:33.379Z | "version": "2",   | 2024-07-20T21:15:33.379Z | "id": "xxx",   | 2024-07-20T21:15:33.379Z | "type": "error",   | 2024-07-20T21:15:33.379Z | "seq": 2,   | 2024-07-20T21:15:33.379Z | "position": "PT0.0S",   | 2024-07-20T21:15:33.379Z | "parameters": {   | 2024-07-20T21:15:33.379Z | "code": 408,   | 2024-07-20T21:15:33.379Z | "message": "Timed out waiting for opened response"   | 2024-07-20T21:15:33.379Z | },   | 2024-07-20T21:15:33.379Z | "serverseq": 0   | 2024-07-20T21:15:33.379Z | }

Code runs upto this line,

this.logger.info(`onOpenMessage - Selected media: ${JSON.stringify(this.selectedMedia)}`);
return this.runOpenHandlers(message);

On runOpenHandlers we don't get response within 5 seconds, but we get response in 9 seconds for this line to be executed - await Promise.allSettled(promises); Not sure why this takes time & usually this will be done in a second, but some times it takes upto 10 seconds & this causes the timeout error of 5 seconds. There is no error either in the above promise function, it gives status as "fulfilled" for those promises.

This is occurring more often now and couldn't find any way to fix it. Let me know if you need more details on logs or what not. Kindly please assist on this issue fix ASAP.

rstrahan commented 1 month ago

Thanks @rajapradeep ! @babu-srinivasan Can you investigate / resolve this one? Tx.

babu-srinivasan commented 1 month ago

Thanks @rajapradeep.

The audiohook server (fargate) returns 'opened' message after running all the open handlers. One of the key open handlers establishes streaming session with Transcribe. Could you please check the logs for those calls that failed (to ensure Transcribe session is established properly) 1/ check the logs to see if you have any of the following messages (this means the session with Transcribe was established successfully). "=== Received Initial response from TCA. Session Id:" or "=== Received Initial response from Transcribe. Session Id:"

2/ Check the logs for any Transcribe related errors. Once you identified the specific error messages if any, we can troubleshoot to find/fix the root cause.

Thanks Babu Srinivasan

rajapradeep commented 1 month ago

Thanks for your response @babu-srinivasan @rstrahan

I have verified the logs & found the mentioned log, "=== Received Initial response from TCA. Session Id:" (this means the session with Transcribe was established successfully).

We also created a support ticket with aws transcribe team to check if this is related to transcribe but no issues on transcribe service.

I have also attached the entire log for a single sessionid.

Below added runOpenHandlers function for log reference

async runOpenHandlers(message: OpenMessage): Promise<void> {
        // Run all open handlers. We allow registering of open handlers while other open handlers run.
        // So we just run through the list until the list is empty.
        while ((this.openHandlers.length !== 0) && (this.state === 'OPENING')) {
            this.logger.info(`Inside runOpenHandlers while for convid: ${message.parameters.conversationId}`);
            // Note: we initiate all handlers in parallel and then wait for the promises to settle
            const promises: Array<PromiseLike<CloseHandler | void>> = [];
            for (let handler = this.openHandlers.shift(); handler; handler = this.openHandlers.shift()) {
                this.logger.info(`Inside for after shift for convid: ${message.parameters.conversationId}`);
                try {
                    const result = handler(this, this.selectedMedia, message.parameters);
                    if (isPromise(result)) {
                        promises.push(result);
                    } else if (result) {
                        this.closeHandlers.push(result);
                    }
                } catch (err) {
                    this.logger.info('In OpenHandlers Error block: ', normalizeError(err));
                    promises.push(Promise.reject(err));
                }
            }
            const results = await Promise.allSettled(promises);
            this.logger.info(`runOpenHandlers promise results for convid: ${message.parameters.conversationId} ${JSON.stringify(results)}`);
            let err: unknown = null;
            results.forEach(result => {
                if (result.status === 'rejected') {
                    err = result.reason;
                    this.logger.warn('Ending OpenHandlers Error block: ', normalizeError(err));
                } else if (result.value) {
                    this.closeHandlers.push(result.value);
                }
            });
            if (err) {
                this.logger.error(`error in runOpenHandlers: ${err}`);
                throw err;  // Rethrow the last one 
            }
        }
    }

Usually from openhandlers to TCA response we get in few milliseconds like below log.

| 2024-07-25T07:47:35.015Z | [07:47:35.006] DEBUG: Inside for after shift for convid
| 2024-07-25T07:47:35.251Z | 2024-07-25T07:47:35.250Z [info]: tsParams
| 2024-07-25T07:47:35.339Z | 2024-07-25T07:47:35.339Z [info]: === Received Initial response from TCA. Session Id
| 2024-07-25T07:47:35.535Z | [07:47:35.402] DEBUG: runOpenHandlers promise results for convid

For error scenario in attached log it took around 4.5 seconds to get TCA response & runOpenHandlers promise results came after 5 seconds which is when timeout happens.

| 2024-07-24T19:00:58.111Z | [19:00:57.415] DEBUG: Inside for after shift for convid: xxx
| 2024-07-24T19:00:59.777Z | 2024-07-24T19:00:59.777Z [info]: tsParams
| 2024-07-24T19:01:02.955Z | 2024-07-24T19:01:02.955Z [info]: === Received Initial response from TCA. Session Id:
| 2024-07-24T19:01:03.303Z | [19:01:03.237] DEBUG: runOpenHandlers promise results for convid

Let me know if you need more details to investigate this. prod_log_lca.txt

Thanks Rajapradeep

rajapradeep commented 1 month ago

Hi @rstrahan @babu-srinivasan,

Any update on the above issue?

rajapradeep commented 1 month ago

Hi @rstrahan @babu-srinivasan

Any update on the above issue? Let me know if you need more details.

Thanks Rajapradeep

rstrahan commented 1 month ago

Hi @rajapradeep

@babu-srinivasan has left AWS.
I wish I could help you, but I'm afraid I will be of limited assistance..

Couple of thoughts.. In #94 tarungupta83 said:

This issue is broadly solved, We did changed the setup, while we also updated the code to latest version available. After that we are not facing issues w.r.t call drop.

  1. Can you also confirm you are running latest version?

This timeout comes from the Genesys side of the AudioHook connection, right? Again from #93 it seems so:

About 50% of the time, the Genesys Audiohook to LCA (FarGate) doesn't send the "opened" message in response to the "open" message from Genesys Audio Hook. In 20-30% of cases, Genesys attempts to reconnect to LCA by resending the "open" message. On these second attempts, LCA typically responds with the "opened" message. However, when Genesys Audio Hook only makes a single attempt, the call usually fails.

  1. Can you contact Genesys support to see if the 5 sec timeout can be extended, to, say, 10sec - since that would seem to provide sufficient buffer? What do they say?

  2. Another option you could try is to refactor the code to send the 'opened' message back to Genesys without waiting for the runOpenHandlers() method to complete.. i.e. refactor / reorder code here so the opened message is sent immediately without waiting for the handler promises to resolve.

Sorry I can't be of more help, but please do post back to let us know how you get on.

rajapradeep commented 2 weeks ago

Thanks for the details @rstrahan. We will see what we can do on fixing it with your suggestions.

rstrahan commented 1 week ago

Thanks @rajapradeep - please keep us posted here with your updates, and if you have any code changes you want to submit as a PR to defend against the issue.

rajapradeep commented 1 week ago

Hi @rstrahan. I have made changes to send opened response to genesys in runOpenHandlers function itself instead of waiting for promise to be completed like you mentioned above in step 3 - https://github.com/aws-samples/amazon-transcribe-live-call-analytics/issues/156#issuecomment-2272147451

Another option you could try is to refactor the code to send the 'opened' message back to Genesys without waiting for the runOpenHandlers() method to complete.. i.e. refactor / reorder code here so the opened message is sent immediately without waiting for the handler promises to resolve.

Below are the changes I have done in lca-genesys-audiohook-stack/source/app/src/sessionimpl.ts.

Moved line between 463 to 467 to 607(inside runOpenHandlers).

this.buildAndSendMessage('opened', {
                media: this.selectedMedia ? [this.selectedMedia] : [],
                startPaused: this.startPaused
            });
            this.setState('ACTIVE');

Genesys timeout issue was happening since we were waiting for promise to be completed. So now I have sent the opened response to genesys after starting the promise. Also I have set the state to 'ACTIVE' so we can start receiving audio.

I have attached file difference image below.

Screenshot 2024-08-27 at 7 09 33 PM
rstrahan commented 1 week ago

Brilliant - and this works well i assume? Thanks for sharing the code above.. Would you be in a position to send it in the form of a Pull Request so we can easily review and merge for the next release, confident that the change is exactly what you have tested. Tx!