dheijl / swyh-rs

Stream What You Hear written in rust, inspired by SWYH.
MIT License
348 stars 15 forks source link

Streams tops if input delivers no more data #135

Closed DrCWO closed 2 months ago

DrCWO commented 4 months ago

I like to ask for an improvement regarding long running playback. Using swyh-rs I found out, that playback on the http side stops if the input did not provide data any more.

My suggestion is to keep output active even if no input data is sent. This can be done for example with dummy data PCM packages with content 0x00 sent if the input buffer runs empty.

The advantage will be that I can stop the sound producing App and restart it after some time without an interruption in the http stream.

Would appreciate to see this in the next release please.

dheijl commented 4 months ago

It's already in there if you enable "inject silence". Unfortunately this doesn't work with FLAC because the compression makes this technique useless. But it works with lpcm/wav/rf64.

DrCWO commented 4 months ago

Well but as FLAC is the only thing I can use with Roon so this option is a bit useless for me.

I know that zero blocks cause issues with compression. The strategy to solve this is to add a statistical dither to the lowest bit in these empty blocks. Dithering is a widely used approach to get better sound in digital audio which means this DID NOT degrade the signal!

In my server I did it by calculating one empty block once with the LSBs dithered. This is inserted during silence. Not such a difficult task. Here my node.js code to get such a dithered chunk 👍

function getEmptyDitheredChunk(blockSize, sampleWidthInBytes) {
    var myChunk = Buffer.alloc(blockSize * sampleWidthInBytes);

    // Zufallszahlen zwischen -1 und 1 erzeugen
    // getrennt fuer rechts und links
    var randomRight = [];
    var randomLeft = [];
    for (var i = 0; i < blockSize; i++) {
        randomRight.push(Math.round(Math.random() * 2) - 1);
    }
    for (var i = 0; i < blockSize; i++) {
        randomLeft.push(Math.round(Math.random() * 2) - 1);
    }

    // Zahlen in Chunk kopieren passend zur laenge der Samples (4 oder 6 Bytes)
    var writeIndex = 0;
    var restultBytes = Buffer.alloc(8);
    for (var i = 0; i < blockSize - 1; i++) {
        restultBytes.writeInt32LE(randomRight[i], 0);
        restultBytes.writeInt32LE(randomLeft[i], sampleWidthInBytes / 2);
        for (var j = 0; j < sampleWidthInBytes; j++) {
            myChunk[writeIndex + j] = restultBytes[j];
        }
        writeIndex = writeIndex + sampleWidthInBytes;
    }
    return (myChunk);
}
dheijl commented 4 months ago

Similar code is already in there but it is not active at the moment.

DrCWO commented 4 months ago

Please make it real :-)

dheijl commented 4 months ago

This is my current implentation, not compiled in because the NOISE feature is disabled in my productionj builds:

#[cfg(feature = "NOISE")]
///
/// fill the pre-allocated noise buffer with a very faint white noise (-60db)
///
fn fill_noise_buffer(rng: &mut StdRng, noise_buf: &mut [f32]) {
    let amplitude: f32 = 0.001;
    for sample in noise_buf.iter_mut() {
        *sample = ((rng.sample(Uniform::new(0.0, 1.0)) * 2.0) - 1.0) * amplitude;
    }
}

I can make a build for you with it enabled to see if it's acceptable.

DrCWO commented 4 months ago

Sorry for my late reply was busy with a friend of mine visiting Braunschweig.

I wonder why you use Float. Audiophiles talk about Bit-Perfect transport so getting 24 Bit from the interface should result in exactly the same 24 Bit FLAC data stream. introducing 32Bit floats (24Bit Mantissa) destroys Bit Perfect transport. Please let me know where and why you transfer integer to Float?

B.T.W. Dithering the LSB did not mean to introduce -60dB noise. 24Bit is approx. 144dB SNR and this should be kept. So modifying the LSB with random values is the only thing allowed to stay "nearly" Bit Perfect.

dheijl commented 4 months ago

I don't "use" float. Windows (WASApi), Linux (Alsa, PipeWire, Jack, PulseAudio) and Mac (CoreAudio) all deliver the captured sound in f32 format, so I have no choice.

32Bit floats (24Bit Mantissa) destroys Bit Perfect transport Please quote a qualified sound engineer to support this claim, not a HiFi manufacturers claim. AFAIK 24 bits is only needed in a recording/mixing environment.

I did not mean to introduce dithering, and I'm not mixing it in the existing audio, I only produce it periodically in the absence of audio samples to keep the FLAC encoder producing something to stream so that the connection is kept alive.

Why dithering is only useful when downsampling.

Also: if you don't like it don't use it.

DrCWO commented 4 months ago

I'm sorry if I said something that upset you. Please accept my apology.

Since the sources I know always provide 24 bit Integer, I didn't know that it would be transmitted as F32 in ASIO. My only goal is to ensure that the signal is distorted as little as possible.

And of course you are right. We are only talking about the data that is transmitted during the time when there is no signal. I hadn't considered that.

I hope that you can and want to implement it this way. Sorry for the misunderstanding.

dheijl commented 4 months ago

Don't worry - no need to apoligize. I'll try to publish a beta tomorrow, and I would be interested to know if the noise floor of the "keep-alive" noise is acceptable. I had to turn up the volume really high here to be able to hear it, but perhaps -60 db is still too much.

DrCWO commented 4 months ago

Thank you. Why not use -100dB for the noise this would be inaudible and still be good for conpression?

dheijl commented 4 months ago

You can try the new 1.10.6 release candidate if you like.

I use -90 db now so that 16 bit FLAC still gets something.

I have to turn up the volume of my amplifier almost to max to hear it, it has never been at more than 20 % before, and I like (very) loud rock music.

DrCWO commented 4 months ago

Great to hear, thank you very much 👍 As we start a weekend trip tomorrow morning I can't do any testing until I come back in Tuesday next week. I'll give it a try and will report back.

dheijl commented 4 months ago

Enjoy the trip!

DrCWO commented 4 months ago

Bags are packed and some spare time left 👍

The program did not start any more :-( Also if I start it manually I see for some ms the GUI and then it disappears.

Here the log:

13:03:29 [INFO] SWYH-RS V 1.10.6 - Running on x86_64, windows, windows - Logging started.
13:03:29 [INFO] Config: Configuration { server_port: Some(5901), auto_resume: false, sound_source: Some("Voicemeeter Out B1 (VB-Audio Voicemeeter VAIO)"), sound_source_index: Some(11), log_level: Info, ssdp_interval_mins: 10.0, auto_reconnect: false, _disable_chunked: true, lpcm_stream_size: Some(U64maxNotChunked), wav_stream_size: Some(U32maxNotChunked), rf64_stream_size: Some(U64maxNotChunked), flac_stream_size: Some(U32maxChunked), use_wave_format: false, bits_per_sample: Some(24), streaming_format: Some(Flac), monitor_rms: true, capture_timeout: Some(2000), inject_silence: Some(true), buffering_delay_msec: Some(0), last_renderer: Some("HBM11 Arbeitszimmer"), active_renderers: [], last_network: Some("192.168.0.247"), config_dir: "C:\\Users\\definiteaudio\\.swyh-rs", config_id: Some(""), read_only: false }
13:03:30 [INFO] Selected audio source: Voicemeeter Out B1 (VB-Audio Voicemeeter VAIO)[#11]
13:03:30 [INFO] tb_log: Setup audio sources
13:03:30 [INFO] tb_log: Now running at ABOVE_NORMAL_PRIORITY_CLASS
13:03:30 [INFO] tb_log: Capturing audio from: Voicemeeter Out B1 (VB-Audio Voicemeeter VAIO)
13:03:30 [INFO] tb_log: Default audio SupportedStreamConfig { channels: 2, sample_rate: SampleRate(48000), buffer_size: Range { min: 0, max: 4294967295 }, sample_format: F32 }
13:03:30 [INFO] tb_log: Injecting silence into the output stream

Here the config file:

[configuration]
server_port = 5901
auto_resume = false
sound_source = "Voicemeeter Out B1 (VB-Audio Voicemeeter VAIO)"
sound_source_index = 11
log_level = "INFO"
ssdp_interval_mins = 10.0
auto_reconnect = false
lpcm_stream_size = "U64maxNotChunked"
wav_stream_size = "U32maxNotChunked"
rf64_stream_size = "U64maxNotChunked"
flac_stream_size = "U32maxChunked"
use_wave_format = false
bits_per_sample = 24
streaming_format = "Flac"
monitor_rms = true
capture_timeout = 2000
inject_silence = true
buffering_delay_msec = 0
last_renderer = "HBM11 Arbeitszimmer"
active_renderers = []
last_network = "192.168.0.247"
config_dir = 'C:\Users\definiteaudio\.swyh-rs'
config_id = ""
read_only = false

Any idea on what I can do to make it reappear?

DrCWO commented 4 months ago

It't funny. If I try to start the current stable release it also don't start any more??? image

EDIT: Deleted the config file and it starts again. Strange...

DrCWO commented 4 months ago

Found the issue: if I set inject_silence = false manually in the config file it can be started.

dheijl commented 4 months ago

I can not reproduce this. The Gui disappearing means that you get a Rust "panic", something that is not supposed to happen. On Windows this panic is not shown because you have no console. To get the panic message displayed, you need to run the debug build from a terminal (cmd or windows terminal). I suspect that "inject silence" conflicts somehow with ASIO in the Rust CPal library. A lot of people seemto use inject silence with Wav and I get no complaints. That being said, if you're using flac you should never enable "inject silence", as it prevents the white noise injection because there is always sound being captured, the injected silence in this case. On Linux, the flac white noise injection also never kicks in because the Linux Alsa sound system always produces silence if nothing is being played. So it's not possible to detect that nothing is being played.

DrCWO commented 4 months ago

Still some time left so I figured out how good my processing pipeline is: Green signal is a 750Hz 24Bit dithered input at 48kHz. It is played via Roon to ASIO, runs through VoiceMeeter and was recorded by Audacity from the virtual Windows driver of VoiceMeeter to a 24Bit WAV file. The red curve is the result. The input is -1dB and all in VoiceMeeter is at 0dB.

image

As easily can be seen the noise flor of the output signal is higher and some strange distortions (beginning at 3. harmonic) are included. I'm not sure if this is the effect caused by 24Bit Integer zu F32 processing. It also may result from processing inside Voicemeeter but they claim to work with 24Bit Integer internally.

At the end - and there you're right - anything below 120dB is inaudible so there will be no audible decrease in quality. So this is somehow a theoretic issue. On the other hands best state of the art DACs offer their harmonics that are below -140dB! So from a measurement point of view a scientist can't be too happy ;-)

https://www.audiosciencereview.com/forum/index.php?threads/smsl-su-10-dac-review.38415/

EDIT: 24Bit are -144dB. Here we see more caused by the 64k FFT lag so add 36dB to get the "real" noise floor but the amplitude of the harmonics is correct.

DrCWO commented 4 months ago

Saw your comment and didn't get it. I thought the Insert Silence flag is for continuous playback even if there is nothing getting in? Or did I get something wrong???

dheijl commented 4 months ago

You did get it wrong.

Inject silence injects real silence (empty samples), at the audio source into the audio stream. It is mixed in by the OS. So there is always audio, even if nothing else is being played, the silence is played. This works very well for lpcm, wav and rf64.

But flac compresses this silence so effectively that some players go into time out because the time gap between successive flac frames gets excessively long in their opinion. MPD players like Moode and Volumio don't suffer from this.

By injecting silence it is no longer possible to detect that nothing is being played, so my "no sound" detection for starting white noise injection for flac can never kick in.

That is why you should not enable inject silence when using flac (this also clearly stated in the text of the pre-release).

On Linux the OS seems to inject silence all by itself for a reason that I don't know, so the flac "no sound" detection does not work there at all.

All this does not explain why rust panics on your system when you enable inject silence. It would be helpful if you were prepared to test it with the debug build from the command line so that you can get me the exact panic message. It may be a bug in the cpal library when asio is present and cpal is being built without asio support, as is the case here. Maybe I should try to add native asio support so that you don't need hifi cable.

dheijl commented 4 months ago

Conversion from f32 float to i24 and from i24 to f32: this should cause no distortion, as no resampling takes places. But I suppose you can get a rounding error in the least significant bit for some values. I am not a sound engineer or a mathematician.

dheijl commented 4 months ago

I found this in the Windows WasAPI documentation, when discussing configuring a device for CD audio (44 khz 2 channels) in the sound control panel applet:

The audio engine will use a format with the same number of channels (two) and the same sample rate (44100 Hz), but it will convert samples to floating-point numbers before processing them. The audio engine will convert the floating-point samples in the output mix to 16-bit integers before playing them through the device.

DrCWO commented 4 months ago

I'm back and still on the run. Have to visit customers tomorrow and the day after tomorrow. I'll get back to you when I return.

Best DrCWO

dheijl commented 4 months ago

There's been a few changes in the latest release:

The Changelog and Readme have been updated.

dheijl commented 2 months ago

As there has not been any activity, I'll close this for now. You can always reopen if needed.