Design Limitations - Githubissues

gimesketvirtadieni commented 5 years ago

Hi.

I have just noticed your project with nice objectives. To be honest, SnapCast inspired me as well to contribute to it first, but then I started a project on my one due to some design choices which are impossible to solve:

SnapCast is not (and will never be) bit-perfect: this is due to the way PCM capture and delivery is done (pipes) - there is no way PCM originator can pass variable sampling rate without a protocol, so sampling rate has to be predefined in configuration
Time drift between 'playback' time of the source and OS time; the thing is that playback is controlled by a different crystal opposed to OS time so an explicit synchronization is required

There are more minor issues with SnapCast (like weird client synchronization algorithm, etc.), which can be solved, but the first two require OS specific solution, which makes them unattractive.

Do not get me wrong, SnapCast author did a fantastic job, succeeded in making it so popular and suitable for 99% cases. Whereas for 1% of cases one might need something like... SlimStreamer. I would be glad to get your feedback / comments. Also I am very open to any sort of contribution (please not all the work around SlimStreamer is unlicensed - Public Domain ;)

christf commented 5 years ago

Slimstreamer seems to be yet another very interesting approach. I wonder whether you get audio dropouts when using alsa and the server machine is heavily loaded.

There are a few things that come to my mind.

snapcastc is able to play back synchronously over the network in a synchronous way already. Obviously, since these are two projects, their implementation status is different. Given that snapcastc already implements this feature, it is already usable for synchronous audio playback.
Due to the nature of the pipe setup, snapcastc does not drop data from the inputstream as snapcast is doing. In comparison with slimstreamer, snapcastc can stream faster than playback speed. This is extremely useful because the user can use large buffers to compensate for network issues and still get low latency when hitting start / stop. This is not possible with neither slimstream nor snapcast.
At the moment snapcastc hard-codes opus as a requirement. However I would like this to be configurable in the future. It is a small change. In fact, I did test playback with PCM first.
A few weeks ago I shared your assertion: That time drift between source and OS is causing the clients and server to go out of sync. This is in fact not correct. Dropping data by the snapcast server is causing this. snapcastc does not display the same issue whose root cause is in the pipe configuration of the player.
At the very moment, snapcastc is untested with different sample rates and it does not support float format. This is a little bit of work but not too horrible. For the encoders this is interesting because for example opus is using a different set of functions to work with float formatted audio data than with uint16_t

I do appreciate the thought of joining forces. I can see that slimstreamer is implemented using 20K LOC. snapcastc has ~4K LOC, snapcast 10K. Slimstreamer shares the in my opinion not ideal choice for TCP as transport protocol with snapcast. Slimstreamer also is using threads - something that I specifically wanted to avoid in this project.

At this point I am not entirely sure which project is actually further along - for snapcastc the few major things that are missing are:

implementation of the api
improvements in the time stretcher
support for streaming multiple concurrent streams using one server

The design - implementing slimstreamer as a set of alsa plugins is interesting. What were your reasons for this choice?

Edit: Your title for this issue was "Design Limitations". Which design limitations do you see in snapcastc?

Edit2: I am extremely happy that I have not been able to create a scenario where snapcastc is dropping audio even after many hours of listening time. The only exception is: dropping so many packets that the retry-mechanism fails. That I guess is ok because snapcastc allows using very large buffers without large perceived latency.

gimesketvirtadieni commented 5 years ago

Hi

I only briefly skimmed through the code of snapcastc so please correct me if I wrong:

As a PCM source snapcastc (just like SnapCast) uses a pipe, so if you want to stream arbitrary Linux app then you need to use ALSA pipe plugin
ALSA submits ONLY PCM data to that pipe (no meta data is sent), so neither snapcastc nor SnapCast are able to identify what sampling rate is used in the original audio source
To overcome this problem, sampling rate (as well as other meta data) is predefined in ALSA pipe plugin configuration and snapcastc/SnapCast configuration by setting specific values (like for example 4800/2 channels/16 bit)
This setup requires resampling of the original audio stream in case it uses a different sampling rate
While submitting PCM stream to a pipe, ALSA will do 'throttling', it means PCM receiver will not get data 'ahead of playback', instead it will receive stream according to the timing and sampling rate ALSA is using, which may be different from OS provided timing
Moreover, using lossy codec (Opus) on top of resampled PCM stream degrades audio quality even more

These are the design limitations that I see. Such solution will provide suitable sound quality for casual usage, however there are already few solutions out there that provide that level of quality (SnapCast, Shareplay, ...) so added value that snapcastc brings is not that big. I did experience SnapCast drop-outs, did quite some debugging and figured out that sync implementation is buggy causing drop-outs in my case. However I believe that with such interest, SnapCast defects will be corrected.

That is why I took a different path: I am up to the best possible audio quality streaming can provide: no resampling, no lossy compressing, high performance code! In open source domain there is only one solution (to my knowledge) with these properties - LMS. However its main problem is the server: heavyweight, python monolith. It works well as long as you manage audio from LMS, but it's not suited well if you want to use arbitrary apps for various purposes. This is the problem SlimStreamer solves: it opens up LMS level of streaming quality for ANY app which is able to use ALSA!

I wanted to do as least as possible to get first results, so I have decided to use LMS streaming protocol - SlimProto. It instantly enables usage of ANY LMS compatible client. Normally I use squeezelite which was ported to plenty of flavors of Linux including OpenWRT. With a custom UDP protocol I would need to write a client, test/support/port it on different Linux versions, which does not bring any benefits considering that there is already a mature, wildly used, open-source product.

Regarding using threads, in a nutshell SlimStreamer logic is implemented using Event-Loop pattern based on asio implementation, which is normally referred as 'single-threaded'. I wrote a separate library (a'la node.js) called conwrap2 (link provides a description of conwrap v1). But it is way more complicated than that. For best possible quality PCM data capturing has to be soft-Real-Type-safe, which means no syscalls (no memory allocations, no printf, ...). That is why this part of SlimStreamer is separated from the rest of the app by using dedicated threads and communication is done via lock-free queue. Currently SlimStreamer uses one PCM capture thread per sampling rate, but it can be optimized to just one PCM capture thread by using Multiplexor pattern.

As you can see, it is much more than just avoiding packets being dropped, although I agree with you that it is extremely important ;)

So please let me know if you want to combine our forces!

gimesketvirtadieni commented 5 years ago

Almost forgot your question about using ALSA loopback plugin plus my custom written plugin to capture PCM stream - it is the only universal way I could find to capture PCM stream of any arbitrary Linux app WITHOUT resampling! This way I get all the meta data that ALSA knows about the original PCM stream (like start/end points, sampling rate, format, channels) AND I get the stream in a bit-perfect way AND the stream is provided at the right playback speed!

Please let me know if you know any other possibility to achieve that ;)

christf commented 5 years ago

It seems to me, resampling is the main problem you are hoping to avoid. Does that mean that the sampling rate of each individual song is used and passed along in slimstreamer?

You are spot on with regard to sampling rate for snapcastc. A fixed sampling rate must be set by the program feeding the pipe and snapcastc will require this very information as input as part of its stream configuration. That means, if the original audio data is not in that format, the player will resample once. Depending on the quality of the resampler, this will lead to audible quality degradation. In any case if resampling is to be done it happens before snapcastc is reading it. sox/soxr can be used for that which display pretty good quality (depending on the settings).

Over all we do not have to use the alsa pipe plugin for snapcastc. We can feed directly from mpd/mpv/whatever other program can write to a pipe.

UDP behaves better for media streaming in desperate network conditions and it allows application control over the amount of retries and their interval which is difficult (impossible?) to achieve with tcp. snapcastc already implements such a udp protocol.

With regard to dropouts in snapcast: I actually set out to fix some of the issues but found quickly that there are other underlying issues (the way the input pipe is configured, the way the input pipe is read, the usage of TCP and plain incorrect handling and its threading model). I likely would still not have created an own application in january, if it had an active maintainer that would have commented on my PR from december. In any case, now that we have it I like the fact that snapcastc can read from its input pipe at a different speed than playback speed as this allows to use large buffers and many streams.

One thing I could think of is: instead of putting PCM data on the pipe, we could put a protocol on the pipe that embeds PCM in a packet format providing meta data like sample rate.

I need to give it some thought. It would be nice to be able to use the native sampling rate of the existing audio file instead of requiring a fixed pre-defined sample-rate in a deterministic way instead of leaving it to chance (if the audio data already happens to be at the rate the pipe is requesting then no resampling happens).

gimesketvirtadieni commented 5 years ago

I am not hoping to avoid resampling, SlimStreamer is ALREADY streaming without resampling (the only major missing feature is drift compensation). Every individual song is streamed using original sampling rate. It is only drift compensation will affect the original PCM stream (without drift compensation it is bit-perfect all the way from an arbitrary app to a client's DAC). It is the best audio quality you can get from a streaming solution! Even Apple's Airplay does not have that (it uses fixed 48000 sampling rate).

I agree with you that TCP sucks. For example I faced problems with TCP when implemented round-trip measurements: I needed to implement 'a warm-up' of TCP connection before I get a reliable round-trip results. However, as I wrote, with a custom protocol chances are VERY slim people will use it. Whereas with TCP/SlimProto I can hit 'sudo apt install squeezelite' on any Ubuntu box within the same network and the next moment a new client will connect and stream from SlimStreamer.

Looking forward for your thoughts ;)

christf commented 5 years ago

What type of latency are you using? When throttling the pace I guess it should not be very long. When looking at my wifi, I can see that sometimes for a few seconds there is a disruption. That would have to be compensated and it would directly translate into latency....

gimesketvirtadieni commented 5 years ago

Hi I did not measure exact latency, but it is within 2-3 sec range. Latency was not the primary target for now, the main goal for me is to complete sync playback. Currently synchronous start works fine, but drift compensation is still in progress. By the way, have you tried to measure synchronization between players with oscilloscope?

christf commented 5 years ago

I did not measure exact latency, but it is within 1-3 sec range. Latency was not the primary target for now, the main goal for me is to complete sync playback. Currently synchronous start works fine, but drift compensation is still in progress.

By the way, have you tried to measure synchronization between players with oscilloscope? No I have not. Also I do not see much benefit in that because sound travelling at 0.343 meter / millisecond means that when the listener is slightly away from the ideal listening position, they are already experiencing a latency between the speakers that is in the ballpark of what my synchronisation mechanism achieves when targeting sub- millisecond accuracy.

-- () ascii ribbon campaign - against html e-mail /\ against proprietary attachments

christf commented 5 years ago

So I guess there are multiple approaches to this:

use slimstreamer
use pulseaudios multicast streaming capability
throw CPU at the problem on the server and use a high quality resampler
implement something that wraps the pcm data on the input pipe to ctonain metadata including the sample rate and supply a program that feeds this pipe. At the end of the day this would mean to implement an output plugin for mpd

gimesketvirtadieni commented 5 years ago

Hi

Regarding pulseaudio multicast - it does not work on wireless networks due to UDP multicast flooding network (I suppose Sonos did something on HW level to get around this problem)
With 'high-quality' resampling, I do not want to discourage you, but there is already a solution for that: forked-daapd + shairport-sync (btw written in C); as a bonus you get support for MPD, Apple Remote, Chromecast, Roku, Spotify... You are welcome to take your chances in that league, although I do not see added value with snapcastc
Regarding your last point... There is an opportunity for our solution to meet in the halfway...

Please take a look at SlimPlexor - it is an independent component that SlimStreamer uses to receive PCM stream (including metadata). The only way SlimPlexor and SlimStream communicate is through ALSA loopback devices (one per sampling rate). Potentially you could consider the same mechanism (SlimPlexor + ALSA loopback devices) to capture PCM stream instead of using a pipe... Basically it is exactly the same idea you have expressed in your last point: SlimPlexor adds one extra channel (3 in total) to transfer metadata for every single frame.

Just give it a thought and let me know what you think ;)

christf commented 5 years ago

On Sat, Feb 23, 2019 at 12:34:45PM -0800, gimesketvirtadieni wrote:

Hi

Regarding pulseaudio multicast - it does not work on wireless networks due to UDP multicast flooding network (I suppose Sonos did something on HW level to get around this problem) The default multicast rate on wifi is 1Mbit. When this is raised to something above 2Mbit, then multicast streaming PCM over wifi should work. Only remaining problem would be dropouts.

With 'high-quality' resampling, I do not want to discourage you, but there is already a solution for that: forked-daapd + shairport-sync (btw written in C); as a bonus you get support for MPD, Apple Remote, Chromecast, Roku, Spotify... You are welcome to take your chances in that league, although I do not see added value with snapcastc Shairport-sync does seem to be able to handle multiple streams. Also I do not see an API. Nothing that cannot be fixed. The last time I tried shairport was a few years back

Regarding your last point... There is an opportunity for our solution to meet in the halfway... Indeed.

Please take a look at SlimPlexor - it is an independent component that SlimStreamer uses to receive PCM stream (including metadata). The only way SlimPlexor and SlimStream communicate is through ALSA loopback devices (one per sampling rate).
Potentially you could consider the same mechanism (SlimPlexor + ALSA loopback devices) to capture PCM stream instead of using a pipe...
Basically it is exactly the same idea you have expressed in your last point: SlimPlexor adds one extra channel (3 in total) to transfer metadata for every single frame. I really like the idea. The Slimplexor approach nicely solves the problem that the meta data for the stream has to be in sync with the stream itself. At the same time, I find it very interesting that playback speed at the speakers does not necessarily have to be the same speed at which the audio data is retrieved from disk. I wonder if we can retain this property somehow.

gimesketvirtadieni commented 5 years ago

Hi

There is no way to 'influence' ALSA on how 'quickly/slowly' it consumes/supplies PCM. This is related to the fact that ALSA will sync to a particular DAC (I am not sure how this is done in case of lookback devices though). So the only way to make a particular player go faster/slower is to stretch/squeeze the stream itself. This 'cruise control' is required to compensate particular player's drift (it is still missing in SlimStreamer). Other than compensating the drift, I do not see any other 'useful' case for that. Please note that different playback speed of server and players may not be kept infinitely, cause it would require infinite buffer on the server ;)

P.S. This is the problem with PulseAudio UDP Multicast, I suppose it has not been fixed :( https://bugs.freedesktop.org/show_bug.cgi?id=44777

christf commented 5 years ago

for the stretching part, I have just created #24

For the referenced PA issue: packet load of 200 packets per second is the price you pay for using PCM on the network. You can reduce the packet load by compressing to around 50% but it still is a lot and it still requires to:

use routers that correctly handle multicast (which most don't)
tune the wifi to allow more than the default 1Mbit multicast bandwidth. 6Mbit probably is a good point to start for a single pcm stream.

christf commented 5 years ago

I think an alsa input could be interesting for snapcastc. That would avoid resampling. It would also mean that the audio device will have to be re-opened with the correct sample rate if it changes in the stream.

gimesketvirtadieni commented 5 years ago

Alright! Nice to hear that you are in favor of capturing ALSA stream directly. Regarding reopening ALSA devices when sampling rate changes... My initial thought was the same - somehow SlimStreamer has to reopen loopback device when sampling rate changes. Reopening a device with a new sampling rate is a simple step, the main problem is when to reopen. There are two Linux processes involved and one of then changes sampling rate, then the other (SlimStreamer) must capture that moment. It is doable but VERY complicated. To implement that these two processes must have two channels of communications: one for signaling events (like sampling rate change) and other for PCM. So if you have an idea how to implement it in a simple way - I am all ears :) So I took a different path: SlimPlexor captures that moment when PCM originator changes the sampling rate (in fact SlimPlexor runs in the context on PCM originator) and simply forwards that PCM to a predefined-by-sampling-rate loopback device. SlimStreamer listens to ALL loopback devices using the same rate as SlimPlexor (Multi/Demultiplexor pattern). As a result there is no need for reopening devices by SlimStreamer when sampling rate changes - a different device is used instead ;)

P.S. I am sure you realize that the whole chain (including protocol and clients) must support switching sampling rates to avoid resampling ;)

christf commented 5 years ago

actually capturing the moment should not be too difficult at least when the sending process closes the fd on sample rate changes. How should the receiving process know how the data should be interpreted? At the moment this is hard-coded on the command line and I fear this is what the second communication channel is needed for.

Also there are some gotchas around using float format or 16/32 Bit audio because both are handled differently in codecs and processors.

gimesketvirtadieni commented 5 years ago

I am not totally sure what you mean by 'closes the fd'. In ALSA land I have never come across file descriptors, instead it uses snd_pcm_t. Anyway I agree that there must be the second communication channel for meta data. You are welcome to fork SlimPlexor and experiment with 'second' channel of communication. For example you can try things like detecting sampling rate changes and propagating it to a streamer via some IPC mechanism. I am curious with your findings ;)

christf commented 5 years ago

The problem with the second channel approach is synchonizing it with the audio channel. The change of samplerate cannot be detected because the attribute "sample rate" merely is a helpful information on how to interpret the data. So 48khz 2 channel means "During 1 second we should consume the data of 23040 Bytes". For the same duration of 1s and 2 channels at 96khz 46080Byte must be interpreted. So this is knowledge about the data stream.

BTW: Which players are you interested in and what is the reason for this generic alsa interface?

I am asking because one option could be to interpret the input files and output an enriched datastream - following the apprloach of implementing a protocol.

Also since snapcast supports multiple inputstreams, with hard-coded samplerate each it could work similar to the alsa plugin without actually requiring alsa if they would work in a somehow synchronized way. This would preserve the property of being able to read ahead or delaying reads which is of big help when the server system is busy.

jonsmirl commented 5 years ago

I'll reply here so that you both get the email...

How does time synchronization work between the nodes?

I've seen one design that uses IEEE 1588 PTP (Precision Time Protocol) to synchronize the CPU clocks on the network to within microseconds of each other. The server then pushes out the audio over RTP UDP multicast with timestamps on each packet. This audio can be pushed out faster than real-time in order to make room for nodes to request retransmissions of missing packets. For example send for 0.5s at 2x real-time, pause for 0.5s to allow for retransmission requests, repeat. The time stamps on the packets tell when the audio should be played by each client. And then PTP keeps everything in sync.

https://en.wikipedia.org/wiki/Precision_Time_Protocol http://linuxptp.sourceforge.net/

christf commented 5 years ago

On Thu, Jun 20, 2019 at 08:02:39AM -0700, Jon Smirl wrote:

I'll reply here so that you both get the email...

How does time synchronization work between the nodes?

I've seen one design that uses IEEE 1588 PTP (Precision Time Protocol) to synchronize the CPU clocks on the network to within microseconds of each other. The server then pushes out the audio over RTP UDP multicast with timestamps on each packet. This audio can be pushed out faster than real-time in order to make room for nodes to request retransmissions of missing packets. For example send for 0.5s at 2x real-time, pause for 0.5s to allow for retransmission requests, repeat. The time stamps on the packets tell when the audio should be played by each client. And then PTP keeps everything in sync.

https://en.wikipedia.org/wiki/Precision_Time_Protocol http://linuxptp.sourceforge.net/ snapcastc currently relies on some mechanism to synchronize the time.
Personally, I use ntp. The server tags each chunk with a time at which this chunk should be played. The offset between the "current" server time and the timestamp essentially is the amount of time for
buffering/retransmits that can be used by the client to compensate packet loss.

-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

jonsmirl commented 5 years ago

As an experiment, I am trying to get snapcastc server running on an Android STB. Like this one for $25. https://www.aliexpress.com/item/32993992335.html

Doing this will let you use any of Spotify, Apple Music, Google Music, Tidal, Soundcloud, Amazon Music, Kodi, etc as your networked audio source.

The STB box above is not locked. You can download the Android SDK for it, plus it comes already rooted. So I have the SDK and I'm currently trying to get snapcastc to build.

You can download the SDK here: https://github.com/a9rock64/manifests

gimesketvirtadieni commented 5 years ago

Hi Regarding playback synchronization: it is well known fact that if you want sub-millisec synchronization then TCP is not suitable for that; it must be done by using UDP, or WiFi becons, or by using some other fancy method. However for regular usage synchronization within 5 millisec is more than enough cause this is a threshold when human may start hearing echo (in fact this threshold is only for single tone; for continuous tone this threshold is ~15 millisec). It is possible to achieve that via TCP (it is done in Airplay, SlimServer, SlimStreamer, ...). Regarding 'buffering': if an app 'captures' audio stream transparently to the original source app, then capturing app may not got 'ahead of stream' to do buffering (which is a must for streaming). It means there will be a delay on playback start/skip; it is a matter of 'how much' (it would vary somewhere between 1 sec to 3 sec). This is VERY low latency system which a price to pay for transparent stream capture. In case of SlimStreamer capture is done through ALSA and to overcome this problem, ALSA must be adjusted to allow capturing application to impact 'throttling' done by ALSA (cause ALSA accepts the stream from a source at a fixed sampling rate). Please let us know if you have further questions ;)

christf commented 3 years ago

As an experiment, I am trying to get snapcastc server running on an Android STB. Like this one for $25. https://www.aliexpress.com/item/32993992335.html

Doing this will let you use any of Spotify, Apple Music, Google Music, Tidal, Soundcloud, Amazon Music, Kodi, etc as your networked audio source.

The STB box above is not locked. You can download the Android SDK for it, plus it comes already rooted. So I have the SDK and I'm currently trying to get snapcastc to build.

You can download the SDK here: https://github.com/a9rock64/manifests

how far did you get? Spotifyd will play to pulseaudio/alsa, both can play to a pipe. I would assume without further adjustments the buffers of snapcastc would be mostly empty though.

christf / snapcastc

Design Limitations #17