Closed ArniDagur closed 4 years ago
Sorry for the late response, I was on vacation.
Could you try running against the latest Github version? It includes a tentative solution for this problem, or should at least make it happen far later. If you do that, I would be very interested if it actually fixes your issue.
On my machine, it seems to improve things a lot, but does not fix it entirely. As of yet, I don't know why. Any help would be greatly appreciated.
The logs you see above were produced using the latest GitHub version at the time, and since then I don't see any significant changes in the commit history.
If I understand the error message correctly, the stream creation fails due to an incorrect argument, but none of the arguments change in between invocations of pa_stream_new
(right?).
To me, this indicates that the error has some other cause. The PulseAudio documentation states that Pulse can only deal with a limited number of concurrent streams. As far as I can tell, we are cleaning up nicely after every stream.
Do you have any idea what might cause this error?
It's probably one of these assertions failing (where the error is PA_ERR_INVALID): src/pulse/stream.c#L216-223
PA_CHECK_VALIDITY_RETURN_NULL(c, ss && pa_sample_spec_valid(ss), PA_ERR_INVALID);
PA_CHECK_VALIDITY_RETURN_NULL(c, c->version >= 12 || (ss->format != PA_SAMPLE_S32LE && ss->format != PA_SAMPLE_S32BE), PA_ERR_NOTSUPPORTED);
PA_CHECK_VALIDITY_RETURN_NULL(c, c->version >= 15 || (ss->format != PA_SAMPLE_S24LE && ss->format != PA_SAMPLE_S24BE), PA_ERR_NOTSUPPORTED);
PA_CHECK_VALIDITY_RETURN_NULL(c, c->version >= 15 || (ss->format != PA_SAMPLE_S24_32LE && ss->format != PA_SAMPLE_S24_32BE), PA_ERR_NOTSUPPORTED);
PA_CHECK_VALIDITY_RETURN_NULL(c, !map || (pa_channel_map_valid(map) && map->channels == ss->channels), PA_ERR_INVALID);
if (!map)
PA_CHECK_VALIDITY_RETURN_NULL(c, map = pa_channel_map_init_auto(&tmap, ss->channels, PA_CHANNEL_MAP_DEFAULT), PA_ERR_INVALID);
or this one right here: src/pulse/stream.c#L102
PA_CHECK_VALIDITY_RETURN_NULL(c, name || (p && pa_proplist_contains(p, PA_PROP_MEDIA_NAME)), PA_ERR_INVALID);
Interesting. I would assume that this is a channel map issue, then, as this is the only thing that could conceivably change between invocations. Can you check if the channel map is any different between invocations on your machine?
I wonder if it would make sense for SoundCard to simply replicate the assertion checks the pulseaudio library does, and throw Python exceptions?
That way the program using the library could do things like re-opening the speaker and re-creating the stream, and then continue streaming the audio to or from the new stream object.
Maybe the library could even do some of that transparently in the background? Not sure how feasible that would be...
I wonder if it would make sense for SoundCard to simply replicate the assertion checks the pulseaudio library does, and throw Python exceptions?
That's a good idea! Perhaps indeed not so much for catching the error itself, but for investigating its cause. If we could identify the offending assertion, we might actually be able to fix it before it's happening!
I would be grateful if you would like to do this investigation, as my spare time is limited these days. Would you like to try?
That way the program using the library could do things like re-opening the speaker and re-creating the stream, and then continue streaming the audio to or from the new stream object.
It seems that the error goes away if we simply re-try. But that doesn't make sense; there must be an actual cause to this error, either in SoundCard or in pulseaudio. I would much rather find that cause and fix it, instead of adding a workaround.
I tried the retrying thing (limited retries for up to half a second) with my sound streaming program overnight, and it still failed. More interestingly, all three copies of the program tend to fail at the same time.
That really makes me think there is some temporary state in pulseaudio that causes this issue, but I'll add prints and will hopefully get more info over time...
Thank you for looking into this! It has been a long-standing issue in SoundCard, but I've never had the time to truly delve into it.
So far all I have (the issue happens only a few times a day, and I have no idea how to trigger it quickly) is that
self.stream = _pulse._pa_stream_new(_pulse.context, self._name .encode(), samplespec, channelmap)
returns null.
I'm looking to see how often that happens, whether a retry with sleep helps, and what error pulseaudio is returning.
I have put up a PR for testing; for me the above examplel crashes after 5-8 iterations, while with the PR it doesn't..(at least not within ~100 times)
I put up a PR that returns the pulseaudio errno as an exception to the program using SoundCard when stream creation fails. That can allow for retry loops at higher levels, or simply debugging of what a program is doing wrong (things like specifying an illegal channel map, etc)
It looks like, in the two failures I have seen over the past few days, _pulse._pa_stream_new returns 3 (PA_ERR_INVALID), which suggests either the channel map or sample spec was set up wrong, despite both having been verified above.
@rikvanriel did #103 anything for you?
I have not tried 103# myself, but I suspect it points us at a root cause of SoundCard asking FFI to pass uninitialized memory to pulseaudio, which happens to work most of the time but fails sometimes.
I need to dig into which fields provoke the pulseaudio issues.
Reading the pulseaudio code, and seeing which data structure members it uses, and contrasting that with the ones that get filled in by SoundCard is interesting...
@rikvanriel, since you are diving into the source code at the moment: does pa_stream_new
make a copy of the channel map and sample spec, or simply stores a reference to it? Because if it stores a reference, we would be responsible for keeping them alive. At the moment, both the channel map and the sample spec are free'd when __enter__
returns.
@bastibe it looks like pulseaudio, in pa_stream_new_with_proplist_internal
, copies the contents of samplespec
and channelmap
into the pa_stream
structure.
Also, the error happens inside pa_stream_new
, which returns NULL with pa_errno = PA_ERR_INVALID
, so this happens well before Python is allowed to reclaim the memory.
There are several checks that can return with pa_errno' set to
PA_ERR_INVALID`:
PA_CHECK_VALIDITY_RETURN_NULL(c, ss && pa_sample_spec_valid(ss), PA_ERR_INVALID);
PA_CHECK_VALIDITY_RETURN_NULL(c, !map || (pa_channel_map_valid(map) && map->channels == ss->channels), PA_ERR_INVALID);
if (!map) PA_CHECK_VALIDITY_RETURN_NULL(c, map = pa_channel_map_init_auto(&tmap, ss->channels, PA_CHANNEL_MAP_DEFAULT), PA_ERR_INVALID);
PA_CHECK_VALIDITY_RETURN_NULL(c, name || (p && pa_proplist_contains(p, PA_PROP_MEDIA_NAME)), PA_ERR_INVALID);
The latter two do not apply, since SoundCard is providing a channelmap (map), and is not providing a proplist (p).
The first one sets pa_errno to PA_ERR_INVALID only if pa_sample_spec_valid(ss)
fails. This is something SoundCard already tests for.
The second one tests pa_channel_map_valid(map)
, which SoundCard already tests for too, and it tests whether map->channels == ss->channels
, which I don't see how it could fail since SoundCard uses the value of samplespec.channels
as a parameter to pa_channel_map_init_auto
.
I am willing to run any debug code you can think of, but I'm kind of out of ideas myself...
Thank you for your analysis. What a mess. One of our assumptions must be faulty.
the error happens inside
pa_stream_new
, which returns NULL withpa_errno = PA_ERR_INVALID
, so this happens well before Python is allowed to reclaim the memory.
This might actually not be true: In line 648, we call
channelmap = _pa.pa_channel_map_init_auto(channelmap, ...)
Technically, this dereferences the original channelmap
, and puts a new cffi-pointer into channelmap
, even though it points to the same chunk of memory. If Python's garbage collector were to come around between this line and _pa.pa_stream_new
, it might conceivably reclaim the original variable, and with it the memory held in channelmap
.
Could you try changing the variable name of channelmap
, so the original variable persists until after _pa.pa_stream_new
, and see if that helps?
Alternatively, we could try ruling out all of these kinds of errors by putting an explicit garbage collection (import gc; gc.collect()
) just before _pa.pa_stream_new
, and see if that triggers the error or not.
Adding an implicit gc.collect()
call before __pa.pa_stream_new
does not cause any crashes.
I am now running with the first channelmap
renamed, in addition to the explicit GC, and have not seen issues yet. I'll just let it run for a few hours to see if anything happens.
I'll keep my fingers crossed that we found the culprit!
No errors after 10 hours, but sometimes it took several days, so we'll know "for sure" by Friday :)
It is looking promising, though.
It's been 24 hours now, with no sound card issues. Giving the channelmap variable a different name before and after the call to _pa.pa_channel_map_init_auto
might have solved the issue!
Fantastic news! What a thorny little issue. Quite a subtle error I made there.
Thank you for your help on this issue. It has bugged me for months, and the solution only came about thanks to your analysis.
Would you like to contribute the fix as a pull request?
OK, pull request created! Glad we could figure this one out together.
The error message is the same as: #20
Here is a script that can reproduce the crash 100% of the time:
When the following patch is applied (with a few print statements that are not included here)
the output of the script is the following:
@bastibe Any idea what's going on?