eshaz / icecast-metadata-js

Browser and NodeJS packages for playing and reading Icecast compatible streaming audio with realtime metadata updates.
155 stars 20 forks source link

Multple stream support #75

Closed eshaz closed 2 years ago

eshaz commented 3 years ago

It should be possible to allow for multiple streams of different bitrates that can be used to switch to lower or higher bitrates depending on connection speed.

If the streams are all of the same underlying audio source and are generally synchronized, they could be switched over seemlessly by syncing the decoded audio. I don't think it would be perfect because two streams are not guaranteed to have the audio aligned on a frame basis.

Perfect syncing might be achieved by decoding both streams entirely so the raw PCM could be spliced at any point regardless of the incoming codec and then fed into the web audio api.

iroks commented 2 years ago

Please take a look at this request: https://gitlab.xiph.org/xiph/icecast-server/-/issues/2443.

The idea is to make the client move on the server based on synchronisation points on the source and destination. Most probably, this will cause a problem on a client side by changing the bitrates but based on the same idea, it should be possible to start a new player in a browser at the specific synchronisation point and parallel stop the previous stream.

eshaz commented 2 years ago

@iroks This would be an interesting feature in Icecast. In my experience, it seems that the Icecast developers are not very responsive with new feature requests. It might be a while for this to get a response or be implemented.

There might be a few ways to accomplish this with no changes to Icecast itself.

  1. Use the absolute granule position in the Ogg format to synchronize the two streams.

    • This would be simple pretty simple since it just involves aligning the old position with the new position.
    • This assumes the sources are encoded such that the absolute granual positions of each stream match.
    • This will only work for Ogg streams.
    • There would be a gap between sync points. Absolute granule position is only present in the Ogg page header. Each Ogg page will have multiple codec frames contained within it. There is usually a couple 100ms of audio in each Ogg page. https://xiph.org/ogg/doc/oggstream.html
  2. Another way might be to use a waveform matching algorithm to find the best fit between the old and new streams.

    • It should be possible to decode the old and new streams to PCM and use FFT to create comparison data for the waveforms. The FFT data could be iterated over and the most confident match could be used to find the sync point between old and new. It's important to note that the old and new streams won't match exactly since the bitrates will be different, so the match condition will need to be implemented using a statistical model that outputs a confidence value. https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar
    • This method should work with any stream format that can be decoded in the browser to PCM. The browser's decodeAudioData might work for this since it will only need to decode a single chunk of data.
iroks commented 2 years ago

@eshaz thanks for your answer. Just few thoughts from my side for the client side-solution: I. OGG proposal: this method is restricted to OGG and assumes that the encodings were started simultaneously, e.g. using the same ffmpeg instance with multiple outputs. The idea is really good but it may have a limited practical usage. In most cases, the decoding processes are started as separate processes and will most probably have different granule positions in the header. Moreover, audio streaming is still mp3/aac dominated. If the method cannot work with these formats, it will have very limited acceptance. The gap between sync points should not be a problem.
II.Cross-correlation method: This idea will work with any encoder but will require more resources for synchronisation. It is necessary to make FFT, multiply conjugate and finally IFFT in order to find the peak.
Do you have an idea, how to efficiently decode compressed audio signal to PCM, calculate cross-correlation and play the second stream from the cross-correlation point in browser? You mention that the browser's decodeAudioData might work and may be the AnalyserNode for FFT/IFFT. I found an example server-side JS that is pretty near to the above written requirements: https://github.com/adblockradio/xcorr Thanks in advance .

eshaz commented 2 years ago

I am experimenting right now with just using a covariance calculation between the two decoded PCM float arrays, gradually shifting the offset between the comparison, and returning the sample offset with the greatest covariance i.e. match. This seems to work very well for this purpose and is much more performant than calculating the FFT. I'm using Web Assembly SIMD instructions to do the heavy lifting

You can take a look at what I have so far here: https://github.com/eshaz/synaudio/tree/sync-using-samples

If you have any suggestions, I'd be glad to hear them.

For decoding the audio, I am having success using decodeAudioData to get the PCM. I have another library, codec-parser, that splits the compressed audio into each frame with duration and sample information. My plan here is to use the sample offset from the covariance result to align the old and new data frame wise using the sample offsets returned from codec-parser. I can then feed the correct frames to the downstream playback code.

You can take a look at this effort here: https://github.com/eshaz/icecast-metadata-js/tree/multiple-stream-support

eshaz commented 2 years ago

I have released icecast-metadata-player/1.12.0 which implements the correlation method described above. This is fairly accurate, but not always perfect. I think the delay / frame splicing between old and new could be tuned a bit more.

It also can synchronize on the encoded audio frames by aligning the CRC32 hashes of the old and new data if the two streams are exactly the same, i.e. for switching between endpoints that serve the same data.

Currently, the PCM sync method doesn't work on iOS / Safari. The synaudio web worker terminates immediately without any errors as soon as the wasm correlate function is called. Interestingly, this has no problem running on the main thread. I'll look into this some more.

eshaz commented 2 years ago

@iroks I found a fairly impactful bug in that release that was preventing this from working properly. It has been fixed in icecast-metadata-player/1.12.2. Now when switching between streams and there is some overlap between the old and new streams, the switch between the old and new stream is virtually seamless.

synaudio is very consistent in finding the exact sample offset sync point between the two audio streams. I have a demo for that library here that shows the FFT of the two audio clips an the exact sync point. For the best effect, you can pick a low bitrate version of the comparison clip and actually see / hear where the old audio ends and new audio begins.