bastibe / SoundCard

A Pure-Python Real-Time Audio Library
https://soundcard.readthedocs.io
BSD 3-Clause "New" or "Revised" License
680 stars 69 forks source link

Stream creation sometimes fails with a segfault #63

Closed ArniDagur closed 4 years ago

ArniDagur commented 5 years ago

The error message is the same as: #20

Here is a script that can reproduce the crash 100% of the time:

import soundcard as sc
from time import sleep

speaker = sc.default_speaker()
while True:
    with speaker.player(48000, channels=1, blocksize=1024) as p:
        print(p)
        sleep(0.5)

When the following patch is applied (with a few print statements that are not included here)

diff --git a/soundcard/pulseaudio.py b/soundcard/pulseaudio.py
index fc91815..77f0849 100644
--- a/soundcard/pulseaudio.py
+++ b/soundcard/pulseaudio.py
@@ -86,6 +86,12 @@ class _PulseAudio:
         while self._pa_operation_get_state(operation) == _pa.PA_OPERATION_RUNNING:
             time.sleep(0.001)

+    def _last_error(self):
+        error_code = self._pa_context_errno(self.context)
+        error_as_char_array = _pa.pa_strerror(error_code)
+        error_as_string = _ffi.string(error_as_char_array).decode('utf-8')
+        return (error_code, error_as_string)
+
     @property
     def source_list(self):
         """Return a list of dicts of information about available sources."""
@@ -180,6 +186,7 @@ class _PulseAudio:
     _pa_context_get_sink_info_by_name = _lock_and_block(_pa.pa_context_get_sink_info_by_name)
     _pa_context_get_server_info = _lock_and_block(_pa.pa_context_get_server_info)
     _pa_context_get_state = _lock(_pa.pa_context_get_state)
+    _pa_context_errno = _lock(_pa.pa_context_errno)
     _pa_context_drain = _lock(_pa.pa_context_drain)
     _pa_context_disconnect = _lock(_pa.pa_context_disconnect)
     _pa_context_unref = _lock(_pa.pa_context_unref)
@@ -570,6 +577,9 @@ class _Stream:
             raise RuntimeError('invalid channel map')

         self.stream = _pulse._pa_stream_new(_pulse.context, self._name.encode(), samplespec, channelmap)
+        if self.stream == _ffi.NULL:
+            raise RuntimeError("Stream creation failed, returned NULL pointer."
+                               f" The most recent error code is: {_pulse._last_error()}")
         bufattr = _ffi.new("pa_buffer_attr*")
         bufattr.maxlength = 2**32-1 # max buffer length
         numchannels = self.channels if isinstance(self.channels, int) else len(self.channels)
diff --git a/soundcard/pulseaudio.py.h b/soundcard/pulseaudio.py.h
index 40ad766..75b1570 100644
--- a/soundcard/pulseaudio.py.h
+++ b/soundcard/pulseaudio.py.h
@@ -155,6 +155,7 @@ pa_mainloop_api *pa_threaded_mainloop_get_api(pa_threaded_mainloop *m);
 typedef struct pa_context pa_context;
 pa_context *pa_context_new(pa_mainloop_api *mainloop, const char *name);
 void pa_context_unref(pa_context *c);
+int pa_context_errno (pa_context *c);

the output of the script is the following:

[arnig][~/Development/soundcard-segf…]% python main.py
Creating stream object...
After _pulse._pa_stream_new(...), self.stream is: <cdata 'pa_stream *' 0x5586486a1500>
Inside _Player._connect_stream(), self.stream is: <cdata 'pa_stream *' 0x5586486a1500>
<soundcard.pulseaudio._Player object at 0x7f365bffc790>
Creating stream object...
After _pulse._pa_stream_new(...), self.stream is: <cdata 'pa_stream *' 0x5586486cd910>
Inside _Player._connect_stream(), self.stream is: <cdata 'pa_stream *' 0x5586486cd910>
<soundcard.pulseaudio._Player object at 0x7f365c00ee90>
Creating stream object...
After _pulse._pa_stream_new(...), self.stream is: <cdata 'pa_stream *' 0x5586486f96a0>
Inside _Player._connect_stream(), self.stream is: <cdata 'pa_stream *' 0x5586486f96a0>
<soundcard.pulseaudio._Player object at 0x7f366f63ea10>
Creating stream object...
After _pulse._pa_stream_new(...), self.stream is: <cdata 'pa_stream *' 0x5586486fcd00>
Inside _Player._connect_stream(), self.stream is: <cdata 'pa_stream *' 0x5586486fcd00>
<soundcard.pulseaudio._Player object at 0x7f365c02e190>
Creating stream object...
After _pulse._pa_stream_new(...), self.stream is: <cdata 'pa_stream *' 0x558648700830>
Inside _Player._connect_stream(), self.stream is: <cdata 'pa_stream *' 0x558648700830>
<soundcard.pulseaudio._Player object at 0x7f365c023110>
Creating stream object...
After _pulse._pa_stream_new(...), self.stream is: <cdata 'pa_stream *' NULL>
Traceback (most recent call last):
  File "main.py", line 10, in <module>
    with speaker.player(48000, channels=1, blocksize=1024) as p:
  File "/usr/lib/python3.7/site-packages/SoundCard-0.3.2-py3.7.egg/soundcard/pulseaudio.py", line 583, in __enter__
    raise RuntimeError("Stream creation failed, returned NULL pointer."
RuntimeError: Stream creation failed, returned NULL pointer. The most recent error code is: (3, 'Invalid argument')

@bastibe Any idea what's going on?

bastibe commented 5 years ago

Sorry for the late response, I was on vacation.

Could you try running against the latest Github version? It includes a tentative solution for this problem, or should at least make it happen far later. If you do that, I would be very interested if it actually fixes your issue.

On my machine, it seems to improve things a lot, but does not fix it entirely. As of yet, I don't know why. Any help would be greatly appreciated.

ArniDagur commented 5 years ago

The logs you see above were produced using the latest GitHub version at the time, and since then I don't see any significant changes in the commit history.

bastibe commented 5 years ago

If I understand the error message correctly, the stream creation fails due to an incorrect argument, but none of the arguments change in between invocations of pa_stream_new (right?).

To me, this indicates that the error has some other cause. The PulseAudio documentation states that Pulse can only deal with a limited number of concurrent streams. As far as I can tell, we are cleaning up nicely after every stream.

Do you have any idea what might cause this error?

ArniDagur commented 5 years ago

It's probably one of these assertions failing (where the error is PA_ERR_INVALID): src/pulse/stream.c#L216-223

    PA_CHECK_VALIDITY_RETURN_NULL(c, ss && pa_sample_spec_valid(ss), PA_ERR_INVALID);
    PA_CHECK_VALIDITY_RETURN_NULL(c, c->version >= 12 || (ss->format != PA_SAMPLE_S32LE && ss->format != PA_SAMPLE_S32BE), PA_ERR_NOTSUPPORTED);
    PA_CHECK_VALIDITY_RETURN_NULL(c, c->version >= 15 || (ss->format != PA_SAMPLE_S24LE && ss->format != PA_SAMPLE_S24BE), PA_ERR_NOTSUPPORTED);
    PA_CHECK_VALIDITY_RETURN_NULL(c, c->version >= 15 || (ss->format != PA_SAMPLE_S24_32LE && ss->format != PA_SAMPLE_S24_32BE), PA_ERR_NOTSUPPORTED);
    PA_CHECK_VALIDITY_RETURN_NULL(c, !map || (pa_channel_map_valid(map) && map->channels == ss->channels), PA_ERR_INVALID);

    if (!map)
        PA_CHECK_VALIDITY_RETURN_NULL(c, map = pa_channel_map_init_auto(&tmap, ss->channels, PA_CHANNEL_MAP_DEFAULT), PA_ERR_INVALID);

or this one right here: src/pulse/stream.c#L102

    PA_CHECK_VALIDITY_RETURN_NULL(c, name || (p && pa_proplist_contains(p, PA_PROP_MEDIA_NAME)), PA_ERR_INVALID);
bastibe commented 5 years ago

Interesting. I would assume that this is a channel map issue, then, as this is the only thing that could conceivably change between invocations. Can you check if the channel map is any different between invocations on your machine?

rikvanriel commented 4 years ago

I wonder if it would make sense for SoundCard to simply replicate the assertion checks the pulseaudio library does, and throw Python exceptions?

That way the program using the library could do things like re-opening the speaker and re-creating the stream, and then continue streaming the audio to or from the new stream object.

Maybe the library could even do some of that transparently in the background? Not sure how feasible that would be...

bastibe commented 4 years ago

I wonder if it would make sense for SoundCard to simply replicate the assertion checks the pulseaudio library does, and throw Python exceptions?

That's a good idea! Perhaps indeed not so much for catching the error itself, but for investigating its cause. If we could identify the offending assertion, we might actually be able to fix it before it's happening!

I would be grateful if you would like to do this investigation, as my spare time is limited these days. Would you like to try?

That way the program using the library could do things like re-opening the speaker and re-creating the stream, and then continue streaming the audio to or from the new stream object.

It seems that the error goes away if we simply re-try. But that doesn't make sense; there must be an actual cause to this error, either in SoundCard or in pulseaudio. I would much rather find that cause and fix it, instead of adding a workaround.

rikvanriel commented 4 years ago

I tried the retrying thing (limited retries for up to half a second) with my sound streaming program overnight, and it still failed. More interestingly, all three copies of the program tend to fail at the same time.

That really makes me think there is some temporary state in pulseaudio that causes this issue, but I'll add prints and will hopefully get more info over time...

bastibe commented 4 years ago

Thank you for looking into this! It has been a long-standing issue in SoundCard, but I've never had the time to truly delve into it.

rikvanriel commented 4 years ago

So far all I have (the issue happens only a few times a day, and I have no idea how to trigger it quickly) is that self.stream = _pulse._pa_stream_new(_pulse.context, self._name .encode(), samplespec, channelmap) returns null.

I'm looking to see how often that happens, whether a retry with sleep helps, and what error pulseaudio is returning.

ipelupessy commented 4 years ago

I have put up a PR for testing; for me the above examplel crashes after 5-8 iterations, while with the PR it doesn't..(at least not within ~100 times)

rikvanriel commented 4 years ago

I put up a PR that returns the pulseaudio errno as an exception to the program using SoundCard when stream creation fails. That can allow for retry loops at higher levels, or simply debugging of what a program is doing wrong (things like specifying an illegal channel map, etc)

rikvanriel commented 4 years ago

It looks like, in the two failures I have seen over the past few days, _pulse._pa_stream_new returns 3 (PA_ERR_INVALID), which suggests either the channel map or sample spec was set up wrong, despite both having been verified above.

ipelupessy commented 4 years ago

@rikvanriel did #103 anything for you?

rikvanriel commented 4 years ago

I have not tried 103# myself, but I suspect it points us at a root cause of SoundCard asking FFI to pass uninitialized memory to pulseaudio, which happens to work most of the time but fails sometimes.

I need to dig into which fields provoke the pulseaudio issues.

rikvanriel commented 4 years ago

Reading the pulseaudio code, and seeing which data structure members it uses, and contrasting that with the ones that get filled in by SoundCard is interesting...

bastibe commented 4 years ago

@rikvanriel, since you are diving into the source code at the moment: does pa_stream_new make a copy of the channel map and sample spec, or simply stores a reference to it? Because if it stores a reference, we would be responsible for keeping them alive. At the moment, both the channel map and the sample spec are free'd when __enter__ returns.

rikvanriel commented 4 years ago

@bastibe it looks like pulseaudio, in pa_stream_new_with_proplist_internal, copies the contents of samplespec and channelmap into the pa_stream structure.

Also, the error happens inside pa_stream_new, which returns NULL with pa_errno = PA_ERR_INVALID, so this happens well before Python is allowed to reclaim the memory.

There are several checks that can return with pa_errno' set toPA_ERR_INVALID`:

PA_CHECK_VALIDITY_RETURN_NULL(c, ss && pa_sample_spec_valid(ss), PA_ERR_INVALID); PA_CHECK_VALIDITY_RETURN_NULL(c, !map || (pa_channel_map_valid(map) && map->channels == ss->channels), PA_ERR_INVALID); if (!map) PA_CHECK_VALIDITY_RETURN_NULL(c, map = pa_channel_map_init_auto(&tmap, ss->channels, PA_CHANNEL_MAP_DEFAULT), PA_ERR_INVALID); PA_CHECK_VALIDITY_RETURN_NULL(c, name || (p && pa_proplist_contains(p, PA_PROP_MEDIA_NAME)), PA_ERR_INVALID);

The latter two do not apply, since SoundCard is providing a channelmap (map), and is not providing a proplist (p).

The first one sets pa_errno to PA_ERR_INVALID only if pa_sample_spec_valid(ss) fails. This is something SoundCard already tests for.

The second one tests pa_channel_map_valid(map), which SoundCard already tests for too, and it tests whether map->channels == ss->channels, which I don't see how it could fail since SoundCard uses the value of samplespec.channels as a parameter to pa_channel_map_init_auto.

I am willing to run any debug code you can think of, but I'm kind of out of ideas myself...

bastibe commented 4 years ago

Thank you for your analysis. What a mess. One of our assumptions must be faulty.

the error happens inside pa_stream_new, which returns NULL with pa_errno = PA_ERR_INVALID, so this happens well before Python is allowed to reclaim the memory.

This might actually not be true: In line 648, we call

channelmap = _pa.pa_channel_map_init_auto(channelmap, ...)

Technically, this dereferences the original channelmap, and puts a new cffi-pointer into channelmap, even though it points to the same chunk of memory. If Python's garbage collector were to come around between this line and _pa.pa_stream_new, it might conceivably reclaim the original variable, and with it the memory held in channelmap.

Could you try changing the variable name of channelmap, so the original variable persists until after _pa.pa_stream_new, and see if that helps?

Alternatively, we could try ruling out all of these kinds of errors by putting an explicit garbage collection (import gc; gc.collect()) just before _pa.pa_stream_new, and see if that triggers the error or not.

rikvanriel commented 4 years ago

Adding an implicit gc.collect() call before __pa.pa_stream_new does not cause any crashes.

I am now running with the first channelmap renamed, in addition to the explicit GC, and have not seen issues yet. I'll just let it run for a few hours to see if anything happens.

bastibe commented 4 years ago

I'll keep my fingers crossed that we found the culprit!

rikvanriel commented 4 years ago

No errors after 10 hours, but sometimes it took several days, so we'll know "for sure" by Friday :)

It is looking promising, though.

rikvanriel commented 4 years ago

It's been 24 hours now, with no sound card issues. Giving the channelmap variable a different name before and after the call to _pa.pa_channel_map_init_auto might have solved the issue!

bastibe commented 4 years ago

Fantastic news! What a thorny little issue. Quite a subtle error I made there.

Thank you for your help on this issue. It has bugged me for months, and the solution only came about thanks to your analysis.

Would you like to contribute the fix as a pull request?

rikvanriel commented 4 years ago

OK, pull request created! Glad we could figure this one out together.