Closed byoungdale closed 9 months ago
The issue seems to be that the processData()
function, which is running in the thread created to process responses from aws for that call, does not exit. I think to troubleshoot I suggest adding some more logging there.
Specifically, I wonder about this if condition
If that is not true -- if switch_core_session_locate(m_sessionId.c_str());
returns null, meaning the session is gone, I think we should add a log message so we can see that.
This log message:
stream got final response
Is the event that should cause the processData
function to exit, and the thread to therefore end.
So if that is happening (I'm not sure it is) then we won't hit this code below in that if condition
std::lock_guard<std::mutex> lk(m_mutex);
m_finished = true;
m_cond.notify_one();
switch_core_session_rwunlock(psession);
That setting of the condition is what "wakes up" the processData()
here:
while (true) {
std::unique_lock<std::mutex> lk(m_mutex);
m_cond.wait(lk, [&, this] {
return (!m_deqAudio.empty() && !m_finishing) || m_transcript.TranscriptHasBeenSet() || m_finished || (m_finishing && !shutdownInitiated);
});
Therefore, that thread may block there, waiting for a condition that never happens.
Just an idea, but try adding that logging to get further.
So...to summarize this theory:
m_pStream->close()
at line 272OnResponseCallback
OnResponseCallback
the call to switch_core_session_locate(m_sessionId.c_str());
fails because the session is goneprocessData()
blocks forever in its while loopPlease investigate when you have time @byoungdale and let me know what you think
Yes, thank you. Just saw this. The psession
is gone, I had checked that. So, I added this else to the if (psession) {
block:
} else {
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "GStreamer %p session is closed/hungup. Need to unblock thread.\n", this);
std::lock_guard<std::mutex> lk(m_mutex);
m_finished = true;
m_cond.notify_one();
}
and it unblocks processData
's loop, I see the await ep.destroy()
finishing, and I don't see the zombie call anymore.
But, does onResponseCallback
only run when the transcription stream is finished? I'm having trouble finding AWS C++ SDK documentation that references onResponseCallback.
Ah, I see onResponseCallback
is passed to StartStreamTranscriptionAsync
.
This issue was reported here in drachtio-fsmrf library. But, I am creating an isssue in this repo for it to get fixed.
When a channel is destroyed before the transcription thread is properly closed, the channel locks on switch_thread_join here.
Once
onResponseCallback
runs, there is no session available.Example logs showing that we never see read thread complete log because
switch_thread_join(&retval, cb->thread);
gets locked waiting to join a thread that I think is already gone:fs_cli zombie call (this call is hungup and gone, but the thread is still stuck):
I haven't gotten to a fix yet. But, I think checking if the channel for the media bug is still available before trying to join that thread might work.