CentOS 6 0.16.x client and 0.17.x server causes scratchy audio

totaam commented 8 years ago

Issue migrated from trac ticket # 1178

component: server | priority: major | resolution: fixed

2016-04-20 02:48:43: smo created the issue

When using a centos 0.16.x client from the winswitch repo and a 0.17.x server on fedora 23 built from svn.

What happens is audio is scratchy and can lead to it degrading and eventually not working any more.

My test was always forcing --speaker-codec=mp3 on the client.

I will attach logs from the client and server with -d sound.

totaam commented 8 years ago

2016-04-20 22:34:25: smo commented

It seems I have spoke too soon about this and I've tested it as well with a server I build from 0.16.x tags directory.

I seem to be able to reproduce the same issue here if I attach a client when there is already audio playing.

Start server with child xterm

Connect from CentOS 6 client version 0.16.4

Start test video with vlc (sound is okay)

Disconnect client

Reconnect client (sound should be scratchy)

This eventually leads to the loss of sound completely. I can reconnect the client to regain sound.

I've tried the server both with gstreamer 0.10 and 1.0 using XPRA_GSTREAMER1= environment variable

My client is in virtualbox which means it is not using opengl.

totaam commented 8 years ago

2016-04-21 13:05:23: antoine changed owner from antoine to smo

totaam commented 8 years ago

2016-04-21 13:05:23: antoine commented

I don't see the logs...

does this happen with the other codecs? what is the difference?

when running with -d sound, are there underruns or overruns when the sound goes scratchy?

what does session info show for the buffer levels?

what does xpra info show for the same?

is av-sync on? does turning it off help?

totaam commented 8 years ago

2016-04-21 18:50:26: smo uploaded file `0.16.3-2.el6_client_scratchy_sound.txt` (394.3 KiB)

totaam commented 8 years ago

2016-04-21 18:55:33: smo commented

My bad. I forgot to attach the log. Here is one from the client the second one I have is 4mb in size so I will trim that down and look at some of the other things you have wrote down here.

totaam commented 8 years ago

2016-04-21 22:51:32: smo uploaded file `bad-sound-session_info.png` (299.8 KiB)

totaam commented 8 years ago

2016-04-21 22:58:10: smo commented

Mp3 - Scratchy audio

Wav - Scratchy audio

Wavpack - repeats first 50ms then is OK

Speex - Scratchy audio

totaam commented 8 years ago

2016-04-21 23:14:13: smo commented

Here is some stuff from the server with xpra info

client.speaker.actual-buffer-time=11880000
client.speaker.actual-latency-time=10000
client.speaker.buffer_count=15852
client.speaker.bytes=55918888
client.speaker.codec=wav
client.speaker.codec_description=wav
client.speaker.pid=5545
client.speaker.pipeline=pulsesrc device=Xpra-Speaker.monitor name=src ! queue name=queue min-threshold-time=0
max-size-buffers=0 max-size-bytes=0 max-size-time=50000000000000 leaky=2 ! volume name=volume volume=1.0 ! wav
enc ! appsink name=sink emit-signals=true max-buffers=10 drop=true sync=false async=false qos=false
client.speaker.queue.cur=0
client.speaker.state=active
client.speaker.time=1461276776

totaam commented 8 years ago

2016-04-22 03:04:49: antoine commented

The fact that wavpack is OK makes me think that the sound queue is getting starved and going into an underrun state. (wavpack sends very large chunks at a time). This would also explain the scratchy sound. Are you seeing underrun messages in the log output?

If so, r12454 + r12455 may help: you can adjust XPRA_SOUND_UNDERRUN_MIN_LEVEL up from the default of 50 milliseconds, which will allow us to force the buffer to fill up more before the rest of the pipeline is allowed to pull data from it and empty it again. Note: it is still capped by other values: the range of the measured levels (which may be part of the problem if it's empty - bumped to 150ms r12456), the maximum queue level (can't have min>max).

Another thing worth trying is changing the default sound sink on the client using XPRA_SOUND_SINK (try autoaudiosink, pulseaudiosink and alsasink, maybe even osssink, oss4sink), it probably uses pulseaudiosink by default.

totaam commented 8 years ago

2016-04-26 20:10:50: smo changed owner from smo to antoine

totaam commented 8 years ago

2016-04-26 20:10:50: smo commented

I've tried changing XPRA_SOUND_UNDERRUN_MIN_LEVEL up in increments of 10 on my server and it almost seems like the problem gets worse but I can't tell for sure.

XPRA_SOUND_SINK i've tried all these different settings and it uses pulsesink by default.

This also doesn't seem to change anything. Sound does however seem to work fine the first time I connect and start video so I'm attaching a client log from that and also one after I disconnect the client and reconnect and sound seems to be bad at this point. Maybe this can help narrow it down?

totaam commented 8 years ago

2016-04-26 20:11:22: smo uploaded file `sound-good.txt` (361.5 KiB)

totaam commented 8 years ago

2016-04-26 20:11:33: smo uploaded file `sound-bad.txt` (309.9 KiB)

totaam commented 8 years ago

2016-04-27 02:30:18: antoine changed owner from antoine to smo

totaam commented 8 years ago

2016-04-27 02:30:18: antoine commented

XPRA_SOUND_UNDERRUN_MIN_LEVEL needs to be adjusted client side, and is capped by the max queue level, try larger increments too.

totaam commented 8 years ago

2016-04-27 19:20:45: smo changed owner from smo to antoine

totaam commented 8 years ago

2016-04-27 19:20:45: smo commented

Started server on Fedora 23 VM with
xpra --no-daemon --bind-tcp=0.0.0.0:15000 --start-child=terminator --start-child='vlc ~/test.mp4' start :15
When connecting the client it does the same behaviour I have been describing here. Queue seems to be constantly in underrun state.

Adjusted on the client XPRA_SOUND_UNDERRUN_MIN_LEVEL by increments of 25 from 50 -> 150 with no noticeable difference and same queue state.

Also tried large values like 1024 or 2048 just out of curiosity. Still no change.

I can attach more client logs if you like but they are very similar to the sound-bad.txt attachment.

totaam commented 8 years ago

2016-04-27 21:31:25: smo uploaded file `underrun_min_150.txt` (438.2 KiB)

totaam commented 8 years ago

2016-04-27 21:34:04: smo uploaded file `underrun_min_150_wavpack.txt` (95.4 KiB)

totaam commented 8 years ago

2016-04-27 21:35:38: smo commented

Added a couple more logs both with underrun_min_level set to 150.

First one is with problematic mp3 audio and the second one is wavpack which while sounds fine is very unsynced with the video.

totaam commented 8 years ago

2016-04-29 01:20:32: afarr uploaded file `ticket1178-pulsesink-connect.txt` (157.8 KiB)

logs connecting with: XPRA_SOUND_UNDERRUN_MIN_LEVEL=30 XPRA_SOUND_SINK=pulsesink XPRA_SOUND_SOURCE_BUFFER_TIME=64 XPRA_SOUND_SOURCE_LATENCY_TIME=32 xpra attach tcp: -d sound

totaam commented 8 years ago

2016-04-29 01:21:36: afarr uploaded file `ticket1178-autoaudiosink-connect.txt` (359.1 KiB)

logs connecting with: XPRA_SOUND_UNDERRUN_MIN_LEVEL=30 XPRA_SOUND_SINK=autoaudiosink XPRA_SOUND_SOURCE_BUFFER_TIME=64 XPRA_SOUND_SOURCE_LATENCY_TIME=32 xpra attach tcp: -d sound

totaam commented 8 years ago

2016-04-29 01:22:23: afarr uploaded file `ticket1178-alsasink-connect.txt` (287.8 KiB)

logs connecting with: XPRA_SOUND_UNDERRUN_MIN_LEVEL=30 XPRA_SOUND_SINK=alsasink XPRA_SOUND_SOURCE_BUFFER_TIME=64 XPRA_SOUND_SOURCE_LATENCY_TIME=32 xpra attach tcp: -d sound

totaam commented 8 years ago

2016-04-29 01:39:32: afarr commented

Tested a bit with 0.16.4 12444 CentOS client with r12454,5,6 patches added against a 0.17.1 r12453 fedora 23 server.

Adjusting the XPRA_SOUND_UNDERRUN_MIN_LEVEL from 30 up to 250 (I also took the liberty of customizing the patchworks to try some borderline absurd values)... didn't seem to make any difference. Adjusting XPRA_SOUND_SOURCE_BUFFER_TIME & XPRA_SOUND_SOURCE_LATENCY_TIME values didn't seem to have any real impact either (from 128/64 to 64/32 to 32/20). Smo figuered out how to disable av-sync, but that didn't help either.

When I tried alsasink though, I was able to connect/disconnect (repeat ad nauseum) without anything more than an ocasional bit of static as the sound is starting upon connection to a session with sound already playing. (I was, as usual, just using a browser and video to generate sound... connecting & disconnecting the client to induce the sound issues.)

Looking at the Session Info, though - it actually looked like the min buffer levels were adjusting to keep higher than the actual sound output levels (which, needless to say, keeps the client in a constant state of underrun).

I'll attach a screenshot of this with the XPRA_SOUND_UNDERRUN_MIN_LEVEL=250, though I was seeing about the same with most levels (unless the scale changed, at which point it looked closer to the shot Smo attached earlier).

I did also notice (maybe useful info) that, if I paused the videos for more than a minute or so (to get up to get a beverage, for instance)... when restarting the sound, the Sound Buffer chart would usually return to a more normal looking graph, with min buffer levels no longer adjusting to stay "above" the sound levels (could something be triggering the min buffer to use the max buffer heuristics with the CentOS 6.4 using python2.6, or something like that?).

I'll attach a slideshow (3 or 4 frames, not getting carried away) to try to make that description make sense.

totaam commented 8 years ago

2016-04-29 01:43:27: afarr uploaded file `ticket1178_min_underrun-50_reconnect-initial-values_alsasink` (68.0 KiB)

"Re-connect" to a session whose sound is now running, the graph

totaam commented 8 years ago

2016-04-29 01:44:10: afarr uploaded file `ticket1178_min-underrun-50_reconnect-after-30-seconds_alsasink` (70.2 KiB)

after about 30-ish seconds, the min buffer levels start to behave oddly

totaam commented 8 years ago

2016-04-29 01:45:14: afarr uploaded file `ticket1178_min-underrun-50_reconnect-after-a-minute-and-change_alsasink` (75.4 KiB)

After a full minute(ish) the min buffer levels seem to now be "intentionally" exceeding the actual sound levels

totaam commented 8 years ago

2016-04-29 01:56:50: afarr commented

Initial connection (with alsasink) to session with sound running - chart looks relatively as expected (also generally looks like this about 30 seconds to a minute after pausing a sound source).

[[Image( ticket1178_min_underrun-50_reconnect-initial-values_alsasink)]]

[[BR]]

-*

Let the sound run for a vague 30 seconds and the graphs show the min buffer level continuing to be "aggressive".

[[Image( ticket1178_min-underrun-50_reconnect-after-30-seconds_alsasink)]]

[[BR]]

-*

Let the sound run for another long 30 seconds, and the min buffer levels seem to very efficiently exceed the sound levels, adjusting to do so.

[[Image(ticket1178_min-underrun-50_reconnect-after-a-minute-and-change_alsasink)]]

[[BR]]

Pausing for a minute or two will also return the min level buffer to a less weird behavior for some little or longer time (not sure what deciding factor might be), though the min level buffers will almost always (usually after sound pauses for a second again, then sometimes returning to expected behavior and sometimes remaining in constant overrun state) return to the "aggressive underrun" state.

totaam commented 8 years ago

2016-05-04 02:39:33: afarr commented

Maxmylyn says he's done some tests with XPRA_SOUND_UNDERRUN_MIN_LEVEL=150 & XPRA_SOUND_MARGIN=300 and had good results.

Meanwhile, I tried with just the XPRA_SOUND_MARGIN, set at 300, 150, & 50. At 300 sound seemed pretty good for me (without setting underrun min level)... at 150 I saw a bit of scratchiness (heard). At 50 though I saw the Sound Buffer behaving as in the bottom shot of the previous comment... and had sound go scratchy and then cut out after about 10 minutes.

Not sure that's anything consistent though.

Will do some more testing to see how consistent that is.

totaam commented 8 years ago

2016-05-04 13:39:46: antoine uploaded file `centos-latency-via-drifting.patch` (2.2 KiB)

change pulsesink settings to try to make it work on centos6

totaam commented 8 years ago

2016-05-04 13:44:46: antoine changed owner from antoine to afarr

totaam commented 8 years ago

2016-05-04 13:44:46: antoine commented

Please confirm:

I believe that the problem is the centos 6.x clients work only? and which release of 6.x? (comment:11 mentions 6.4 so I assume that's the one - though the problem is likely to affect all centos versions in the same way)

does the xpra version matter? I seem to be able to reproduce this issue with trunk (0.18) as well as older versions, but the ticket title mentions 0.17

how did you get the required mp3 plugins installed? rpmfusion?

comment:13 states that raising the margin and min-level gives "good results" - isn't that a viable solution? works absolutely fine here with centos 6.7 (and I doubt 6.4 will be very different), it would be good to know which environment value needs tweaking (could well be both) and whether those values need to change depending on network conditions, especially jitter

I'm also getting good results with the speex codec

another potential workaround is using alsa as the sound output plugin - I thought this plugin was not affected by the scratchy sound but comment:7 is unclear on that - there may be other problems with alsa, but there are also workarounds for those, ie: [https://wiki.archlinux.org/index.php/PulseAudio#ALSA]

failing that, can't you use wavpack, at least temporarily? it will suck a lot of bandwidth (~6MB/s) but that's a stop-gap measure of sorts and also a good reference point for debugging the working queue level values

comment:11 states that Smo figuered out how to disable av-sync, but that didn't help either - isn't it just a case of adding av-sync=no to the command line?

Meanwhile, I've looked at the code and fixed some minor issues (r12526) and improved the debug output (r12527). I am attaching a quick and dirty patch which seems to improve things for me, the big problem with this patch is that it breaks #849 and win32 and osx, etc.. It seems to help here, but not in all cases, so it may need to be combined with the env vars or even something else. Warning: this is just a proof of concept at this point. Does it help?

totaam commented 8 years ago

2016-05-05 03:24:00: afarr commented

I tried applying the patch to the 0.16.4 12444 centOS client, but I'm sure you realize that didn't work.

Tried also with the 0.17.1 and 0.18.0 clients from your repo - but the sound wouldn't work at all with those clients against a 0.17.1 r12453 fedora 23 server. The clients connected, but the speaker was set to off (using the dock to try to re-start triggered a lot of -d sound client output, but failed to start speakers at all).

Quick log output client-side (I'll attach more complete logs of output):
2016-05-04 12:19:31,224 sound output pipeline error: GStreamer encountered a general resource error.
2016-05-04 12:19:31,225 sound output  pulsesink.c(833)
2016-05-04 12:19:31,225 sound output  gst_pulseringbuffer_acquire ()
2016-05-04 12:19:31,225 sound output  /GstPipeline:pipeline0/GstPulseSink:pulsesink0
2016-05-04 12:19:31,225 stopping speaker because of error: GStreamer encountered a general resource error.
I was able to get sound to work with the 0.17.1 12513 client, by using the XPRA_SOUND_SINK=alsasink though. For the most part it seemed stable, and didn't seem to cause any issues with sound for local applications (a concern I know smo had).

With the 0.17 and 0.18 clients though, especially if I included either the patch you just posted or the r12454, r12455, r12456 patches.. it seemed like av-sync was disabled (again, I suspect you realized this, but not sure if I am the only one who didn't realize how many of these options were going to disable av-sync).

Interestingly, going back to testing with the 0.16.4 12444 client (as mentioned in comment:11), but without bothering with any of the patches, instead just using variations of XPRA_SOUND_MARGIN, I found that the av-sync still worked, at least mostly.

With XPRA_SOUND_MARGIN=50 I found that the av-sync was about 4-6 frames (of 30 fps) audio late (so, ~ 130-200 ms audio late), but that was testing with a centos hardware machine without proper drivers & no opengl... and, unfortunately I still saw sound often scratchy and sometimes crashing entirely.

With XPRA_SOUND_MARGIN=75 and XPRA_SOUND_MARGIN=100 I still found av-sync to be about 4-6 frames audio late, saw less and less scratchiness and less and less sound crashes... but still saw them semi-regularly (something like 1 in 4 and 1 in 5 connections before sound starts... +/- twice the rate of problems when connecting to a session with sound already running?).

With XPRA_SOUND_MARGIN=150 the av-sync seemed to slide to about 5-7 frames audio-late (maybe 150-210 ms?), but it wasn't lost entirely (with the patches I was seeing about 13-15 frames late... 500 ms-ish, and trying ham-fisted av-sync-delta values, up to and including 750ms, was no real help because the heuristics, as I'm sure you realize, were apparently disabled). At 150 though, I was very rarely seeing any issue with sound started after the session had been initialized, and also very rarely when disconnecting/reconnecting (maybe 1 in 12 times?... and usually only after a long period of playing after the re-connection).

Didn't get a chance to test any further along, but the fact that the sound margin seems to improve matters so much, while preserving at least the basics of av-sync, seems like it might be the way to go.

Speaking of which, unless I was missing something, it also looked like the alsasink, while it was willing to share sound with other applications, didn't seem to support av-sync?

A moment to address your questions...

I've been testing with a CentOS 6.4, but I think smo has tested with 6.6 or 6.7... as far as we can tell there isn't much difference.

I addressed the version a little above... but more concisely (all against a 0.17.1 r12453 server)- with the 0.16.4 12444 I seem to have mp3 support without having to set alternate sinks, with the 0.17.1 and 0.18.0 from your /beta repo I don't seem to have any functional sound except by specifying alsasink (which doesn't seem to support av-sync).

I've been testing using your clients, with occasionally spliced in patches. Maybe I should use rpmfusion to get mp3 plugins for the 0.17.1 & 0.18.0?

As I mentioned above, the XPRA_SOUND_MARGIN seems to be a potentially viable solution... especially since it doesn't seem to require patches to work (I'm assuming I wasn't just getting dumb-luck results that seemed to make sense when testing without a patch), and it doesn't impact (mortally) the av-sync. I haven't tested yet with jitter, drops, etc. ... that promises to be a good time though.

Haven't tried the speex codec yet. I'll see about twisting maxmylyn's arm perhaps.

As I mentioned above, the alsasink results were pretty good, and seemed to work with 0.17.1 and 0.18.0 ... but the av-sync seemed to be disabled (the sync seemed about as far off as it could be... so far it was hard to tell if the audio was late, the video late... or maybe the audio was so late that it looked like the video was late but it was the video late for the wrong second of audio...)

With the XPRA_SOUND_MARGIN prospects looking as promising as they are, wavpack seems like it wouldn't be worth the effort... I think.

Yes... av-sync=no on command line was sufficient, but I didn't see that as an option when searching wiki and xpra --help, so my guesses on syntax proved, faulty. I'll check with smo and see how he found the option... might've been searching the actual code base, if so I'll go ahead and add to wiki if you'd like, or would you like me to see if someone can add to the xpra --help?

Ok, this is long enough now. I'm going to end the comments before i forget to attach some logs that may, possibly, be of some use.

totaam commented 8 years ago

2016-05-05 03:25:27: afarr uploaded file `ticket1178_0-18-0-r12496-client_0-17-1-r12453-fedora23-server_d-sound-start-speaker-output.txt` (26.8 KiB)

0.18.0 CentOS client -d sound error outputs connecting to 0.17.1 server

totaam commented 8 years ago

2016-05-05 03:26:48: afarr uploaded file `ticket1178_0-17-1-r12513-client_0-17-1-r12453-server_sound-fails-like-0-18.txt` (26.2 KiB)

0.17.1 CentOS client -d sound error outputs connecting to 0.17.1 server - very similar to 0.18.0, but maybe I missed something

totaam commented 8 years ago

2016-05-06 02:21:52: afarr commented

Some more testing with my favorite 0.16.4 12444 centos client (from winswitch beta repo), tweaking the XPRA_SOUND_MARGIN a bit more...

XPRA_SOUND_MARGIN=175 seems to produce a sound channel that's pretty reliable (maybe 1 in 20/30 connections the sound will become scratchy/crash?) even with connections/disconnections. At 200 it just superstitiously feels a little more stable. At each of these settings the av-sync (again, with opengl=off because I seem to have cobbled together a pile of hardware which insists on trying to use blacklisted swrasterizer for its opengl) is generally about 6-8 frames audio late on a 1080p display (actually reading as 1600x900 because of those driver/video card issues)... if I'm remembering the lack of opengl effects correctly (and assuming that the effects with an osx using intel vs. centos using... whatever correlate) - that would translate to about 8-10 frames audio late with a reasonable system with those settings (about 166 - 233 ms?).

Just to indulge my OCD, I double checked the 0.17.1 and 0.18.0 clients with XPRA_GSTREAMER!=NO, but those clients still give any sound without also using XPRA_SOUND_SINK=alsasink. Using alsasink, though, definitely doesn't seem to support av-sync.

And that's about all that I can glean. It looks to me like the XPRA_SOUND_MARGIN=200 value makes the centos client (0.16.4 using gstreamer 0.10.29) sound stable, though it throws the av-sync a bit (further) off.

totaam commented 8 years ago

2016-05-06 14:26:50: antoine uploaded file `centos-noscratchy-sound.patch` (1.6 KiB)

minimal patch to try to get rid of the scratchy sound

totaam commented 8 years ago

2016-05-06 14:41:09: antoine commented

First, let's go through the previous comments to try to clarify things:

Tried also with the 0.17.1 and 0.18.0 clients from your repo - but the sound wouldn't work at all with those clients against a 0.17.1 [[BR]] I'm not seeing this, how do I reproduce this? (commands, etc) It works no better and no worse than previous versions for me, which makes sense considering the pipeline is unchanged. (you can view the pipeline used with "-d gstreamer", near the beginning)

Are you sure that pulseaudio is running properly at that point? (ie: run pavucontrol)

[[BR]]

With the 0.17 and 0.18 clients though, especially if I included either the patch you just posted or the r12454, r12455, r12456 patches.. it seemed like av-sync was disabled (again, I suspect you realized this... [[BR]] No. Only the patch can introduce extra latency in the sink which we will not be taking into account when calculating the video sync delay. The other 3 changesets do not have any direct impact on av-sync: they may just allow the queue levels to go higher, but the queue level is what we use for calculating the av-sync delay so this should not really matter. You should verify the av-sync delay that the server calculated to make sure that this is the case, and also verify that there is a video region on screen as non-video areas are never synchronized.

[[BR]]

I was able to get sound to work with the 0.17.1 12513 client, by using the XPRA_SOUND_SINK=alsasink though. For the most part it seemed stable, and didn't seem to cause any issues with sound for local applications (a concern I know smo had). [[BR]] I believe that the alsasink will still actually end up using pulseaudio underneath.

[[BR]]

With XPRA_SOUND_MARGIN=50 [[BR]] That's the default value already.

[[BR]]

With XPRA_SOUND_MARGIN=150 the av-sync seemed to slide to about 5-7 frames [[BR]] Are you sure? The margin is only used to adjust the maximum level of the buffer. We still take the buffer level into account, so even if the buffer level ends up higher, we should be adjusting accordingly.

[[BR]]

Speaking of which, unless I was missing something, it also looked like the alsasink, while it was willing to share sound with other applications, didn't seem to support av-sync? [[BR]] It is possible that alsasink buffers more internally. How much was the delta? etc..

[[BR]]

Maybe I should use rpmfusion to get mp3 plugins for the 0.17.1 & 0.18.0? [[BR]] That doesn't make sense: with centos 6, either you have the gstreamer mp3 plugins or you don't. The xpra version is irrelevant.

[[BR]]

I didn't see that as an option when searching wiki and xpra --help [[BR]] These things belong in the man page - use man xpra, which is missing an entry for this option, I will fix.

[[BR]]

clients with XPRA_GSTREAMER!=NO [[BR]] The name is XPRA_GSTREAMER1, the possible values are 0 and 1, and this should have no effect on centos6 because centos6 does not have gstreamer 1.x

Now for some reading:

gstreamer produces "scratchy" sound, leads to these pulseaudio bugs: pulseaudio produce scratchy sound, game trine2/steam voice chat/steam big picture mode produce scratchy sound with 5.1 usb audio card - but the resolution seems different from the original bug report...

alsasink does not synchronise properly with a live streaming source, yet (timing skew) seems like what we're seeing and is full of pipeline suggestions (231 comments!)

Now, everything points towards a problem in either the gstreamer sink code (especially since centos6 is based on a very old version of gstreamer 0.10...) or our configuration of the pipeline when it comes to syncing and clocks. Or both.

So I've listened to a lot of scratchy music today (butchered some great songs), whilst tweaking:

do-timestamp: I believe we don't want this enabled, the timestamps of the buffers will have network jitter

drift-tolerance + slave-method: try to let the sink figure out how best to deal with things, as with the clock:

provide-clock + qos: the sink should drive things

sync: probably should be on, but as discussed in the ticket above, I've had to turn it off before... so I've left it off in the patch

async: not sure

With the patch attached, we no longer try to prevent underruns at all which may be a bad thing since this code is there for a reason (remove the new "return 1" to restore it), this removes a lot of the scratchiness but not all. I'm not 100% sure which exact changes are required to get things working properly, which changes should be merged ("do-timestamp" probably) and which should be applied as workarounds for older versions of gstreamer. And as I mentioned before, these changes should be applied in the right place, cleanly, to avoid breaking all the other platforms. If that doesn't resolve it, as a last resort, it may be needed to patch the gstreamer packages to provide a more up to date pulsesink. (and hope that the problem is there rather than in pulseaudio or the kernel!)

totaam commented 8 years ago

2016-05-07 02:02:08: afarr commented

I think clarity will help, unfortunately I'm starting to get different results from the same tests... so I'm now becoming very confused. I'll try to clarify some of my comments for you, and hope that in the process I might stumble across some explanation for what's going on.

How do I get 0.17.1 & 0.18.0 clients to produce those errors and fail sound? yum install xpra-0.17.1-1.el6_7 xpra-common-0.17.1-1.el6_7. Build a 0.17.x tag server (12534). Launch server (dbus-launch xpra --no-daemon --bind-tcp=0.0.0.0:1203 --start-child=xterm --start-child=xterm --mdns=no start :13 --start-new-commands=yes). Launch client (xpra attach tcp:(IP):(port)).

0.17.1 12513 centos client, 0.17.1 r12453 fedora 23 server.
2016-05-06 15:05:25,118 sound output using audio codec mp3
2016-05-06 15:05:26,913 sound output pipeline error: GStreamer encountered a general resource error.
2016-05-06 15:05:26,913 sound output  pulsesink.c(833)
2016-05-06 15:05:26,913 sound output  gst_pulseringbuffer_acquire ()
2016-05-06 15:05:26,913 sound output  /GstPipeline:pipeline0/GstPulseSink:pulsesink0
2016-05-06 15:05:26,913 stopping speaker because of error: GStreamer encountered a general resource error.
2016-05-06 15:05:26,914 sound output stopping
... repeat with yum install xpra-0.18.0-0.20160430.el6_7 xpra-common-0.18.0-0.20160430.el6_7; build trunk server (12534); same launch commands server & client. 0.18.0 r12496 centos client; 0.18.0 12534 fedora 23 server. Same errors.

Checked with pavucontrol, all looked as close to expected as my best guess could approximate what to expect.

Running with -d gstreamer I noticed the list of sound devices and the indication to try XPRA_PULSEAUDIO_DEVICE_NAME which, when I used to specify the headset I'm using, fixed the sound immediately! (for some reason, it was apparently not picking up the sound device I was using for other applications, though I may just not have set that to the satisfaction of centos).

The 0-.17 and 0.18 clients' sync issues were stemming from the fact that, poorly explained in retrospect, I could only get sound from them when using alsasink... so that's just a duplicate of the alsasink sync issue.

The effects of the XPRA_SOUND_MARGIN changes on the av-sync, am I sure? No. Not even a little bit. I saw the effects, but now I not only can't reproduce them, but I can't even reproduce the increased sound channel stability with the XPRA_SOUND_MARGIN. I am at a complete loss as to what is going on with this... but that at least puts me in agreement with smo, who says he was seeing no effect from them at any point. I suspect I was doing something wrong (well, technically, something right), but I can't say what that something might be.

The alsasink sync-delta seemed to be about 13-15 frames audio late (about ~ 500ms), but it was so out of sync that it was unclear if it might've been more. Trying to adjust with av-sync-delta (via control channel), however, I had no real success. At one point I'd set av-sync=delta=700, which had dropped the delta to about 2-4 frames audio late (66-133 ms?)... only to watch it, over the course of a minute or so, stretch back out to about 6-8 frames audio late (200-266ms). At that point, obviously, the av-sync-delta starts to become more comical than useful (set at 950 ms... is it in sync or just pushed to a full second late?, or early? and, if it keeps sliding, what's the point?).

I either have the mp3 plugins or I don't?... Well, yes - I guess I was assuming I didn't and the 0.16.4 was working because of something in the packaging. Now that I've found that explicit setting of the sound source makes the mp3 work with 0.17.1 & 0.18.0, I'll assume that the mp3 plugins are in place, and rpmfusion has been a favorite source for maxmylyn and smo in the past. Since they set up the machine that I'm testing with, I'll go with yes, rpmfusion.

Errr... XPRA_GSTREAMER! = typo. Ooops. And yes, I was expecting that it would have no effect, and I was satisfied in that expectation... but maxmylyn had mentioned to me using the variable in conjunction with others, so I tried it to be sure. I won't bother to re-test with 0 or 1 instead of off or on, since we don't need the extraneous confusion.

With that bit of clarification, I'm now utterly confused. Nothing that seemed to be working previously seems to be working, and my efforts to go back and find a place where I'd gone awry have been a failure.

I will read the reading and reboot the machines and ... see if I can make any sense out of it another day (best two out of three?).

totaam commented 8 years ago

2016-05-11 02:06:55: afarr commented

I seem to have narrowed down where my mistake came from. It took a little while because it only vaguely makes any sense.

It would appear that the XPRA_SOUND_MARGIN effects on av-sync were an odd coincidence... or the result of something else which I haven't bothered to try to track down. Those should be ignored for now.

The issue with the 0.17.1 and 0.18.0 clients I haven't re-visited, let me know if you'd like me to do so. (Maybe something to make detection of default sound outputs needs to be done, or maybe I'd scrambled the defaults on the machines I was testing on and was inducing my own problems.)

And here we come to the truly odd revelation.

Testing the 0.16.4 client against 0.17.1 server (centos 6.4 against fedora 23... with a centos client that supports opengl now) I got a bit overzealous and merged the r12527 patch. Since I'm sure you realize that gstlog calls in a 0.16.4 client are not liable to be of very much help... which is why it took me so long to get my head around the fact that, somehow, that patch and the neverending tracebacks it outputs (NameError: global name 'gstlog' is not defined) somehow makes the sound stable.

Out of curiosity, we tried deleting one, or two, or any number of permutations, of those three 'gstlog' patch bits with slightly varying results, but they all basically left the client sound to crash sooner or later. Likewise, removing that patch entirely caused the sound to sputter and crash (though, with the previous patches and with XPRA_SOUND_MARGIN=200, or 300, the sound is somewhat stable for a little while as long as it is never running when a connection is made, but only started after initial connection (and stopped before any disconnect/reconnect).

Aside from some deranged theories of sink.py gnomes, the only thing I can think is that the code that handles the error output is somehow either triggering an alternate codepath, or is perhaps slowing things down enough so that the gstreamer 0.10.29 is able to keep up?

Any ideas?

totaam commented 8 years ago

2016-05-11 06:49:27: antoine commented

Very interesting. What happened was that those logging errors were cutting out all the queue self tuning code. I think we have at least a workaround: r12544 + r12545 + r12546. This has been running for hours here on centos6 without any problems. But we may still need to increase XPRA_SOUND_QUEUE_TIME by default so it never hits overruns. It disables all sound queue events (or at least on centos6, we ignore them since we can't turn them off). r12547 makes this the default on all outdated versions of gstreamer (0.10.x branch) like the one found on centos6. XPRA_QUEUE_SILENT can be used to toggle this (-1 is "auto" which is the default, 1 disables the tuning code, 0 enables it) r12548 catches one place where we were still adjusting min/max, causing scratchy sound to occur (just harder to hit).

The problem with this fix is that it disables the queue tuning code added for #849, so the queue may grow large as jitter accumulates and we will never do anything to bring it back down again. This can make av-sync more difficult and cost more server side RAM.

Side note: we should have tested 0.15.x and earlier versions. 0.14.x is still supported and probably immune to this problem since it doesn't have any of the #849 code. Another interesting thing: I can now reproduce the sound issues with Fedora clients just by running with XPRA_GSTREAMER1=0... not sure how I missed that before. (or maybe something else made things worse since the flag was added in 0.16.x - this could be useful for finding another way to fix this)

The better solution may be to trigger the queue tuning code from a timer instead of the queue events directly, and / or try to limit how many times we tune the min and max levels as this is what seems to trigger the scratchy sound. On a hunch, I had cooked up a patch that did exactly this - I will try to resurrect it and make it work. But if the fixes above work, then this is less of a priority: the better fix is just to use gstreamer 1.x wherever possible.

totaam commented 8 years ago

2016-05-11 12:54:08: antoine commented

Scratch that (;), we have a problem: it still occasionally goes into an underrun as soon as we start the sound (just as before, this is more likely when sound is playing already before connecting) and it doesn't recover unless you pause the sound.

That's because the tuning code would forcibly raise the min level to try to prevent further underruns, but the changes above disabled it. And trying to re-instate this code, even in a milder form, just breaks things again.

Every time I introduce jitter (ie: as per #999#comment:2) the sound goes scratchy, it recovers when I remove it. Which means that the sink is just draining the buffer, never allowing it to fill up again.

I've tried all sorts of things, and they all led me astray: you think it works, go down that route, only later to find out that you were just lucky when testing and that it still breaks at random..

So, the best I could come up with is the file attached (drop in replacement for the one in trunk). It deals with the underruns by pausing the pipeline, which allows the queue to fill up again. It does deal with jitter well. It does NOT deal with overruns yet and those will still cause scratchy sound, which I will try to fix. (and unfortunately, pausing the pipeline makes it more likely we'll hit overruns...)

totaam commented 8 years ago

2016-05-11 12:54:55: antoine uploaded file `sink.py` (17.3 KiB)

work in progress: use pipeline pause to allow the buffer to grow

totaam commented 8 years ago

2016-05-12 02:35:08: afarr commented

Did some testing with 0.18.0 r12542 centos 6 client against 0.18.0 r12549 fedora 23 server, with that work in progress sink.py swapped in place of the centos repo sink.py.

Didn't get to trying with any induced jitter, wasn't necessary to trigger scratchiness.

Tried with XPRA_SOUND_QUEUE_TIME = 450 (Smo's best guess of the default value), 550, 600, 700.

450 seemed to behave similarly to the client without the variable, the sound generally starts overrunning and sputtering (though not dying) within 5-10 minutes (even quicker if using a sound source that requires new sound starts more often, like using youtube & loading one video at a time, second or third sound-channel loading will often have scratchiness).

550 seemed to last longer before the overruns and scratchiness started (not surprisingly).

600 & 700 seemed to last even longer, again not surprisingly, but once the sound queue got this large (anecdotally, didn't have enough time to test thoroughly enough to be sure) it seemed like teh av-sync-delta values were no longer persistent enough to compensate and sync A & V (similarly to what I was seeing with XPRA_SOUND_SINK=alsasink).

It looked like the Sound Buffer graphs were clear predictors of when the sound would become scratchy - once the levels begin to overrun the max buffer, the sound becomes as scratchy as it was previously becoming with the underruns (I'll attach a shot, in case you didn't already see).

Likewise, when I stop/start the speakers, the sound recovers and the graph of the sound levels returns to the expected real estate between the buffer levels... and begins climbing again. (I'll attach for comparison, again on the off chance you haven't already seen.)

Interestingly, it seems like the sound channel is considerably more stable when it is just running, than when it has to stop and then restart for whatever reason. (Though, I suppose that's just the behavior we've been seeing all along.)

totaam commented 8 years ago

2016-05-12 02:36:12: afarr uploaded file `ticket1178_sink-py-patch_sound-graph-as-scratchy-audio-starts` (45.8 KiB)

SOund Buffer graph as the sound becomes scratchy, and the levels begin overrunning the max buffer

totaam commented 8 years ago

2016-05-12 02:37:06: afarr uploaded file `ticket1178_sink-py-patch_sound-graph-after-speaker-off-on-toggle_sound-fix` (51.3 KiB)

Sound buffer graph after a off/on toggling of the speakers (after an overrun state & scratchiness)

totaam commented 8 years ago

2016-05-12 09:25:18: antoine commented

It looked like the Sound Buffer graphs were clear predictors of when the sound would become scratchy - once the levels begin to overrun the max buffer [[BR]] In comment:21, I specifically stated, and emphasized with uppercase writing: It does NOT deal with overruns yet and those will still cause scratchy sound. The fix in that file was for underruns only and that part seemed to work ok when I tested it. Overruns should have been ignored.

But never mind, I believe I now have a better fix for both:

r12550 undoes most of the silent queue changes which were not really helping anyway

r12551 adds utility functions for fade-in / fade-out via the volume element (+ fixes in r12554 + r12556)

r12552 just improves logging

r12553 the big one (see commit message for details): re-injects the last packet until we get out of the underrun state (much better than the pause approach from comment:21), and drops new packets until we get out of the overrun state

r12557 reduces some of the noise caused by the change above using fadeout+fadein, fixups in r12559 + r12560

This large custom handling code was needed because the overrun / underrun handling seems to be broken with older versions of gstreamer: temporarily changing the min or max level just causes scratchy sound and does not adjust the queue level as we would expect. It would be nice if we could re-use the fadein+fadeout to smoothe out the scratchy sound with newer gstreamer versions but the volume element is placed before the queue. Maybe in a future release.

Testing: this has only been tested with mp3. It might not work well at all with the other codecs, tough. Only tested in a virtual machine. Real hardware may well behave differently, especially when it comes to clock synchronization (which is what causes the queue levels to drift up or down).

To trigger the underrun state: usually just connecting will trigger it but we now start with the sound muted for the first second or so, so this isn't audible. To trigger underruns reliably, just connect then add to the server network interface:
tc qdisc add dev eth0 root netem delay 450ms 100ms 25%
To trigger the overrun, just remove the limit that was added for triggering the underrun, after connecting:
tc qdisc del dev eth0 root netem delay 450ms 100ms 25%
If this works OK on centos6 with trunk, I'll work on a patch for older branches. Note: this form of queue tuning is crude compared to #849, so it will make av-sync harder to get right.

totaam commented 8 years ago

2016-05-13 01:56:05: afarr commented

Err... just meant to include info about the overrun behavior in case it would be useful, guess I wasn't clear that I wasn't actually expecting the overruns to be 'handled'.

Boy have you done a wonderful job handling them now though.

Centos 6 0.18.0 r12565 agianst fedora 23 0.18.0 r12565 server... the sound is at least comparable in stability to the newer OS's running gstreamer 1.0.

Running sound is stable. Disconnecting/reconnecting is stable. tc qdisc add dev ens3 root netem delay 400ms 100ms 25% has a moment of psychedilic sound but recovers promptly and is stable thereafter (surprisingly so). Deleting that jitter also has the moment od sound distortion, before returning to stability. I almost think it might be better with loss than the gstreamer 1.0 code - I ran it up to tc qdisc add dev ens3 root netem loss 4% and it was still mostly stable.

This works significantly better than 'OK'.

It looked like the av-sync was impacted a bit, but surprisingly little. With a 1080p display (and working opengl) the sound seemed to be about 5-7 frames audio-late, where I would normally expect maybe 1-3 frames (or more often about 0 - 2) with opengl and a 1080p display. It seems like an impact, but certainly seems to be something that can be handled (will test with some av-sync-delta that seems to compensate for a while, see if the offset is at least consistent).

I definitely recommend it for a back port (though I suppose some testing of this sound code will be in order in addition to that needed for the gstreamer 1.0 clients when trying to find those last bits of fine tuning by display &/or whatever else seems to add or subrtact a couple of frames here and there).

totaam commented 8 years ago

2016-05-13 05:19:24: antoine uploaded file `centos-sound-full.patch` (16.1 KiB)

patch to apply the change in full to the v0.17.x branch

totaam commented 8 years ago

2016-05-13 05:20:09: antoine commented

Backported to v0.17.x in 12566, and v0.16.x since you need it in 12567 (was harder because some of the code has been moved and renamed since).

Please close if this works for you.

Xpra-org / xpra