gavv / libASPL

C++17 library for creating macOS Audio Server plugins.
https://gavv.net/libASPL
MIT License
56 stars 4 forks source link

Deadlock when setting device sample rate and stream format #5

Closed gchilds closed 1 year ago

gchilds commented 1 year ago

I've found libASPL can deadlock if clients try to set device sample rate and stream physical format too close together.

Thread 1 (setting stream physical format) locks:

  1. Stream mutex
  2. Device mutex

Thread 2 (setting device sample rate) locks:

  1. Device mutex
  2. Stream mutex

and deadlock results.

I think any operations on streams and devices that call Stream::RequestConfigurationChange() and Device::RequestConfigurationChange() are affected. Actually I think you could say that all uses of device_ in Stream.cpp can lead to deadlock.

I'm not sure what the solution is. Maybe one of

Anyway, thanks very much for libASPL, it makes testing ASP ideas and probing CoreAudio so much easier.

gavv commented 1 year ago

Thanks for detailed report! Will look into this.

gavv commented 1 year ago

Hi, thanks again for report.

I was able to reproduce the problem locally. However, it was not deadlock, but instead something more ridiculous.

When Stream::SetPhysicalFormatAsync is called, it invokes Device::SetSampleRateAsync, which in turn invokes Stream::SetPhysicalSampleRateAsync.

Inside SetPhysicalSampleRateAsync there is a bug: it invokes Tracer::OperationEnd without invoking OperationBegin first. Which leads to overflowing call depth counter inside Tracer from 0 to 4294967296. After which, Tracer tries to allocate 4GB string and fill it with "-" padding symbols, and this process takes really long time :-)

I've pushed a series of changes to address all this (6eef024d98b3125743614281cc9cb0df85305b0f):

  1. First of all, avoid overflow in tracer. When overflow is detected, tracer reports a error instead of overflowing.
  2. Adjust all tests to catch these errors reported by tracer, to catch such bugs earlier.
  3. Add tests for your specific case, which reproduces the problem and triggers this new tracer error detection.
  4. Add missing OperationBegin calls to fix the bug.

Then I took another look at the sample rate setters... And I realized that the implemented behavior is actually weird. When you update physical format of one stream, sample rates of all other streams and implicitly updated too, as well as device nominal sample rate.

This behavior may be very surprising (if you didn't notice corresponding comment), given that all other libASPL setters are stupid and just update single value in single object. I think I blindly replicated behavior of Apple examples when I wrote this. It may be OK in some cases, but definitely is not a generally expected behavior.

I decided to remove the propagation logic at all, for consistency with the rest of the library. I also renamed device SampleRate to NominalSampleRate for clarity. See this commit: af501ee92670b07e3b5796e702dc7a943f1a8b14

BTW, I hope a deadlock is not possible here, because all mutexes you mentioned are recursive exactly to prevent this kind of deadlocks, and also RequestConfigurationChange designed to be reentrant.

Would be awesome if you could test the new code!

I'll wait a while and then will release a new major release with these changes, if no problems are reported.

gavv commented 1 year ago

I've released 3.0.0 which includes fixes above.