croissanne / portaudio_opensles

android opensles hostapi for portaudio
Other
48 stars 16 forks source link

Deadlock in Pa_StopStream() #15

Open darksylinc opened 6 months ago

darksylinc commented 6 months ago

Describe the bug

I got deadlocked here:

syscall 0x00000077b6eb9f90
pthread_join 0x00000077b6f23444
PaUnixThread_Terminate pa_unix_util.c:441
StopStream pa_opensles.c:1231
Pa_StopStream pa_front.c:1469
AudioMixerPortAudio::pause() AudioMixer.cpp:119

Portaudio's thread is stuck here:

syscall 0x00000077b6eb9f90
__futex_wait_ex(volatile void *, bool, int, bool, const timespec *) 0x00000077b6ebe860
sem_wait 0x00000077b6ecc038
StreamProcessingCallback pa_opensles.c:988
__pthread_start(void *) 0x00000077b6f22b18
__start_thread 0x00000077b6ebf360

Android's internal audio threads are stuck here (i.e. the one responsible for calling NotifyBufferFreeCallback and unstuck everything):

syscall 0x00000077b6eb9f90
android::ClientProxy::obtainBuffer(android::Proxy::Buffer *, const timespec *, timespec *) 0x00000077be369d78
android::AudioTrack::obtainBuffer(android::AudioTrack::Buffer *, const timespec *, timespec *, unsigned long *) 0x00000077be35ee30
android::AudioTrack::processAudioBuffer() 0x00000077be35e37c
android::AudioTrack::AudioTrackThread::threadLoop() 0x00000077be35d744
android::Thread::_threadLoop(void *) 0x00000077acbe2658
android::AndroidRuntime::javaThreadShell(void *) 0x00000077af0a2a3c
thread_data_t::trampoline(const thread_data_t *) 0x00000077acbe1dec
__pthread_start(void *) 0x00000077b6f22b18
__start_thread 0x00000077b6ebf360

To Reproduce

It is unclear the exact conditions but I got a repeatable way to repro on my phone (Android 12 MIUI Global 14.0.1 Xiaomi POCO F2 Pro):

Initialization code:

const double c_sampleRate = 48000.0;
static int patestCallback( const void *inputBuffer, void *outputBuffer, unsigned long framesPerBuffer,
                           const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags,
                           void *userData )
{
    return paContinue;
}

    // Open an audio I/O stream.
    const PaError errorCode = Pa_OpenDefaultStream( &mStream, 0,   // no input channels
                                                    2,             // stereo output
                                                    paInt16,       //
                                                    c_sampleRate,  //
                                                    0,             // frames per buffer
                                                    patestCallback, this );
 Pa_StartStream( mStream );
  1. Run the app with Android Studio's debugger attached.
  2. Start playback with initialization code provided ago.
  3. Put a breakpoint somwhere inside StreamProcessingCallback.
  4. Wait a few seconds (vary the frequency in which you do this).
  5. Optionally, disable the breakpoint (follow this step randomly)
  6. Continue
  7. Repeat step 3 until audio disappears. When this happens sem_wait will no longer wake up.
  8. Calling Pa_StopStream() afterwards will deadlock because it wants the worker thread to end, but the sem_wait will never wake up.

I suspect running out of data is what's causing this (could possibly be an Android OS bug) as when doing steps 3 - 6 I get the following:

2024-03-08 22:24:25.843 28227-28569 AudioTrack              com.artofthestate.airwar             W  restartIfDisabled(105): releaseBuffer() track 0xb4000076a9f24000 disabled due to previous underrun, restarting

Googling online I can't find much but I did find this StackOverflow question:

(...) For some reason AudioTrack doesn't seem to like sitting around with an empty buffer. (...)

Workaround

I was able to "fix" the problem with the following snippet:

struct timespec abstime;
// Calculate the time needed to play all buffers, then add 500ms for spare room.
const int64_t millisecondsNeeded =
    numberOfBuffers * ( 10000 * (int64_t)stream->framesPerHostCallback ) /
        ( (int64_t)stream->streamRepresentation.streamInfo.sampleRate ) +
    500;

abstime.tv_nsec = timeSpec.tv_nsec + millisecondsNeeded * 1000000;
const int64_t remainder = abstime.tv_nsec % 1000000000;
abstime.tv_sec = timeSpec.tv_sec + ( abstime.tv_nsec - remainder ) / 1000000000;
abstime.tv_nsec = remainder;

int semaphoreResult = -1;
do
{
    semaphoreResult = sem_timedwait( &stream->outputSem, &abstime );
} while( semaphoreResult == -1 && errno == EINTR );  // Restart if interrupted by handler

if( semaphoreResult == -1 && errno == ETIMEDOUT)
{
    // Android may get into a deadlock if underruns occur in which stream->outputSem
    // will never wake up. Thus we use a timeout to restart the stream.
    // See https://github.com/croissanne/portaudio_opensles/issues/15
    // SLuint32 currState = 7777;
    //(*stream->playerItf)->GetPlayState( stream->playerItf, &currState );
    ( *stream->playerItf )->SetPlayState( stream->playerItf, SL_PLAYSTATE_STOPPED );
    ( *stream->outputBufferQueueItf )->Clear( stream->outputBufferQueueItf );
    ( *stream->playerItf )->SetPlayState( stream->playerItf, SL_PLAYSTATE_PLAYING );
}

In case you're wondering, uncommenting "currState" (and commenting out my workaround) returns that the state is always SL_PLAYSTATE_PLAYING.

My workaround has the following issue though:

Once the workaround hits at least once, the app will start spamming the following on logcat (level = debug):

PlayerBase::stop() from IPlayer
2024-03-08 22:40:29.067 29971-30200 AudioTrack              com.artofthestate.airwar             D  stop(sessionID=2057)
2024-03-08 22:40:29.067 29971-30200 AudioTrack              com.artofthestate.airwar             D  stop(106): prior state:STATE_ACTIVE
2024-03-08 22:40:29.067 29971-30200 AudioTrack              com.artofthestate.airwar             D  stop(106): called with 376512 frames delivered
2024-03-08 22:40:29.072 29971-30190 thestate.airwa          com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:40:29.072 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(sessionID=2057)
2024-03-08 22:40:29.091 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): prior state:STATE_ACTIVE
2024-03-08 22:40:29.091 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): called with 0 frames delivered
2024-03-08 22:40:30.141 29971-30193 BpBinder                com.artofthestate.airwar             W  Slow Binder: BpBinder transact took 1046 ms, interface=android.media.IAudioPolicyService, code=45 oneway=false
2024-03-08 22:40:30.143 29971-30190 thestate.airwa          com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:40:30.143 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(sessionID=2057)
2024-03-08 22:40:30.143 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): prior state:STATE_ACTIVE
2024-03-08 22:40:30.143 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): called with 768 frames delivered

But here's the fun part:

I can repro this behavior on Xiaomi POCO F2 Pro (stock ROM) and on a Samsung A54 5G (stock ROM Android 14).

But one of my Lenovo tablets is the only device to perform this log spam naturally (without any workarounds). And unsurprisingly, I can't repro the bug there! I already caught Lenovo hacking the GPU driver to workaround a Vulkan driver bug, so I wouldn't be surprised they're doing something similar to my workaround to fix an Android bug.

In a Redmi Note 4X (custom ROM Android 10, Havoc OS) I also can't reproduce, BUT when I force a buffer underrun I get the following:

2024-03-08 22:50:36.049  9039-9926  AudioTrack              com.artofthestate.airwar             W  restartIfDisabled(21): releaseBuffer() track 0x7481520000 disabled due to previous underrun, restarting
2024-03-08 22:50:36.052  9039-9926  <no-tag>                com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:50:36.052  9039-9926  AudioTrack              com.artofthestate.airwar             D  stop(21): called with 818688 frames delivered
2024-03-08 22:50:36.056  9039-9926  <no-tag>                com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:50:36.056  9039-9926  AudioTrack              com.artofthestate.airwar             D  stop(21): called with 768 frames delivered

There is no log spam; those are all the messages that appear after the underrun (until I force another underrun).

My workaround appears to work & fix the issue, but I would love some feedback.

Expected behavior

Pa_StopStream should not deadlock. Audio should not (randomly) disappear if a buffer underrun occurs.

Actual behavior

Deadlock. Audio randomly disappears if a buffer underrun occurs (it doesn't happen on all buffer underruns).

How frequent is this?

I am able to repro this very easily with the debugger attached; but on a normal run it is hard to trigger on fast phones (I think I got this bug twice in 1 year; but the debugger was not attached so I can't be certain).

But we're experiencing high ANRs (3%), particularly from slow phones (our game is very demanding, specially on loading screens) and I suspect this may be (in part?) the cause of them.

Update: Yes, this is the culprit of some of those ANRs. Inspecting all threads, they're stuck in:

android::ClientProxy::obtainBuffer (Android's internal audio thread) and sem_wait (pa_opensles.c:988).

Desktop (please complete the following information):

darksylinc commented 6 months ago

I've updated the workaround code in the ticket; since it now accounts for some edge cases and properly estimates a reasonable amount of timeout.