f4exb / sdrangel

SDR Rx/Tx software for Airspy, Airspy HF+, BladeRF, HackRF, LimeSDR, PlutoSDR, RTL-SDR, SDRplay and FunCube
GNU General Public License v3.0
2.96k stars 447 forks source link

Remote Sink/Input packet loss on Windows #1069

Closed srcejon closed 2 years ago

srcejon commented 2 years ago

I'm trying to use the Remote Sink and Remote Input on Windows - but even when both SDRangel instances are on the same machine, with FEC enabled, I seem to get packet loss:

image

Log messages are:

2021-12-02 11:10:30.450 (D) RemoteInputBuffer::checkSlotData: incomplete frame: slotIndex: 6 m_blockCount: 24 m_recoveryCount: 0 2021-12-02 11:10:39.778 (D) RemoteInputBuffer::checkSlotData: incomplete frame: slotIndex: 13 m_blockCount: 40 m_recoveryCount: 0 2021-12-02 11:10:39.830 (D) RemoteInputBuffer::checkSlotData: incomplete frame: slotIndex: 5 m_blockCount: 5 m_recoveryCount: 0

If I run with Linux as the Remote Sink and Windows as the Remote Input, it runs much better:

image

I notice that when I run both as Linux, the main buffer display is always centred around the mid point:

image

However, on Windows, it is continually all over the place (lurches from full left to full right). (Some extra buffering on Windows possibly?)

Sending Windows to Linux also shows packet loss, so presumably the problem is with Remote Sink on Windows.

srcejon commented 2 years ago

Adding some debug into RemoteSink, I see that:

srcejon commented 2 years ago

It seems it's partly related to the sleep_for call in RemoteSinkSender::sendDataBlock:

std::this_thread::sleep_for(std::chrono::microseconds(txDelay));

In the above example, the default setting of 35% corresponds to a delay of 442us. However, sleep_for struggles to sleep for such a small time, but it's far worse on Windows.

We can time how long it actually sleeps for with:

auto start = std::chrono::steady_clock::now();
std::this_thread::sleep_for(std::chrono::microseconds(txDelay));
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double, std::micro> elapsed = end - start;
qDebug() << "Duration " << elapsed.count() << " desired " << txDelay;

On Linux, I get:

2021-12-02 14:35:07.659 (D) Duration 726.914 desired 442 2021-12-02 14:35:07.660 (D) Duration 829.662 desired 442

So it's nearly 2x (i.e. actually 70% delay). On Windows:

2021-12-02 14:33:38.045 (D) Duration 1640.2 desired 442 2021-12-02 14:33:38.047 (D) Duration 1835.8 desired 442

Now, 1835us is actually more than 100% of the time, so we start getting FIFO overflows.

Even if 1% is set, it seems the minimum time Windows will sleep for, is around 1700us, so it seems this sleep_for shouldn't be used unless txDelay > ~2000us.

Unfortunately, there seems to be another issue as well, as while removing this delay solves the FIFO overflows, packets are still being lost (Windows to Windows is much better, but Windows to Linux hasn't improved).

f4exb commented 2 years ago

Windows is not good at real time this is a known fact. It always has a better thing to do than serve your requests... OK for Excel not so good at SDR stuff.

That sleep was introduced because UDP flow does not pace itself and in some cases it was filling the network pipe too fast. I probably have to re-evaluate this. Maybe this is a particular case.

Another issue is that the sender and receiver do not have synchronized clocks and therefore may not run at the exact same rate. This is why you have to auto balance the FIFO by taking more or less samples at each read depending on how the FIFO mid point leads or lags. The reads occur at fixed times and the exact elapsed time between timer ticks is also measured for more precision. This is fairly complex and works well in Linux where timings are precise enough.

srcejon commented 2 years ago

On the subject of UDP packets being lost because of buffer overflows, for the DATV Modulator, calling socket->setSocketOption(QAbstractSocket::ReceiveBufferSizeSocketOption, some megabytes); helped. I didn't see this in the RemoteInput code. I can't recall whether it was for Linux or Windows, but one of them had very small default buffer sizes.

Is it necessary to try to synchronize clocks when just transmitting IQ samples? If it was audio or video, then sure, but I would have thought this was essentially no different to receiving IQ samples from an SDR where the ADC clock is asynchronous. Although I haven't looked at any of the code for this, so have no idea!

I have noticed another strange behaviour on Windows that I need to investigate some more. When using the Test Source to generate a sine wave feed in to the SSB demod, it sounded like some samples were being dropped - whereas on Linux I can hear a pure tone. I'll open another issue when I have some more firm details.

f4exb commented 2 years ago

A readBufferSize() call on the QUDPSocket right after its allocation returns 0 which means no read buffer limit (https://doc.qt.io/archives/qt-5.8/qabstractsocket.html#readBufferSize). So read buffer size should not be an issue.

f4exb commented 2 years ago

Another observation: on a relatively low sample rate (24 kS/s) I can set the delay to 0% (from original 30%) and see no difference. If I do the same on another link with relarively high sample rate (500 kS/s) I get many more errors at 0% than at 10%. So this delay does matter at least in the present design.

srcejon commented 2 years ago

A readBufferSize() call on the QUDPSocket right after its allocation returns 0 which means no read buffer limit (https://doc.qt.io/archives/qt-5.8/qabstractsocket.html#readBufferSize). So read buffer size should not be an issue.

I think that is the Qt buffer size. QAbstractSocket::ReceiveBufferSizeSocketOption sets the OS's buffer size.

https://doc.qt.io/archives/qt-5.8/qabstractsocket.html#ReceiveBufferSizeSocketOption

"Sets the socket receive buffer size in bytes at the OS level. This maps to the SO_RCVBUF socket option. This option does not affect the QIODevice or QAbstractSocket buffers (see setReadBufferSize()). "

f4exb commented 2 years ago

OK... so I think that's it! Taking the conditions of the last observation (500 kS/s no delay) and increasing the buffer size to 1s of samples (maybe an overkill we'll see...) I get no errors in 5 minutes vs tens in 2 minutes.

f4exb commented 2 years ago

Note that the more rarely used RemoteOutput/RemoteSource couple is based on the same design. Something to consider for this ticket: https://github.com/f4exb/sdrangel/issues/838 Maybe changing the design the same way would be enough.

srcejon commented 2 years ago

This is working better for me now, thanks. I can now seemingly receive at 1MSa/s without packets being reported as lost.

However, I notice some data is still lost at 1MSa/s. In the console of the transmitter (with the Remote Sink), I get:

2021-12-09 12:20:16.447 (C) SampleSinkFifo::write: 1 messages dropped 2021-12-09 12:20:16.447 (C) SampleSinkFifo::write: overflow - dropping 7584 samples 2021-12-09 12:20:16.449 (C) SampleSinkFifo::write: overflow - dropping 9072 samples 2021-12-09 12:20:18.999 (C) SampleSinkFifo::write: 0 messages dropped 2021-12-09 12:20:18.999 (C) SampleSinkFifo::write: overflow - dropping 17664 samples 2021-12-09 12:20:38.647 (C) SampleSinkFifo::write: overflow - dropping 2544 samples

I thought this might be the FEC being too slow, but if I set FEC to 128/0 and sample rate to 2MSa/s, I see many more overflows:

2021-12-09 12:28:17.051 (C) SampleSinkFifo::write: overflow - dropping 1536 samples 2021-12-09 12:28:19.566 (C) SampleSinkFifo::write: 87 messages dropped 2021-12-09 12:28:19.566 (C) SampleSinkFifo::write: overflow - dropping 1536 samples 2021-12-09 12:28:19.571 (C) SampleSinkFifo::write: overflow - dropping 33792 samples 2021-12-09 12:28:22.119 (C) SampleSinkFifo::write: 92 messages dropped 2021-12-09 12:28:22.119 (C) SampleSinkFifo::write: overflow - dropping 47904 samples 2021-12-09 12:28:22.177 (C) SampleSinkFifo::write: overflow - dropping 20256 samples

This appears to be because the FIFO size in the Remote Sink always assumes a sample rate of 48000. PR #1075 should fix that, by setting the FIFO size to match the baseband sample rate.

With that fix, I can almost receive reliably at 2.4MSa/s (so, for example, ADS-B from a remote RTL SDR just about works).

srcejon commented 2 years ago

For use-cases where sample rate is more important than resolution, I think it would be useful if the Remote Sink had an option to reduce bit depth to 16-bit and perhaps 12-bit or 8-bit. A quick looks suggests that for the default 24-bit build, samples are sent as 32-bit - so 16-bit support could easily allow 2x sample rate increase, without any reduction in signal quality, as many SDR's ADCs aren't even 16-bit.

For this, we'd probably also need a gain setting, so that the low level outputs from SDRs like the Airspy Discovery can be amplified in to the MSBs. AGC could be useful for 8-bit transfer of some digital modes - where the amplitude of the signal isn't important.

f4exb commented 2 years ago

For now the sender drives the sample size depending on how it was compiled. On transmission side there is no conversion. If you use the Docker image of sdrangelsrv for arm64 (RPi) it is compiled with 16 bit sample resolution. This is what I am currently dealing with. On receiving side there is already the logic to convert the sample size sent over the network to the current sample size of the client. It works both ways 16 -> 24 or 24 -> 16. However there is no control to forcibly set the sample size of the samples sent over the network and anyway it is limited to 16 or 24 bits.

We use 4 bytes (32 bits) to convey 24 bit data. This is for convenience. Packing 24 bit samples into nibbles of 3 bytes is awkward and anyway leads to conversion overhead. So I would keep the actual sample (I or Q) byte size over the network to 1, 2 or 4. There is the provision in the meta data of the stream to specify the sample size in bytes and its actual resolution in bits.

You still need 24 bits to benefit from Perseus full resolution and also for 16 bit based ADCs you get some extra bits by decimation that you may wish to keep. However the actual resolution needed or acceptable is highly dependent on the setup so an option to choose between 8, 16 or 24 bits at the transmission end makes sense.

f4exb commented 2 years ago

Issue #1079 opened for 8 bits transmission capability

f4exb commented 2 years ago

Impemented in v6.17.5