Audio stuttering and crashes at high bandwidth

CocolinoFan commented 6 months ago

I am experiencing lag/stuttering and the occasional freeze especially at high sampling rates (the HackRF can go up to 20Mhz). I don't think there is a hardware issue as it is working just fine with other SDR software. Here is a video I made showing exactly what I am experiencing: https://youtu.be/kaa1mSWPX0M

CocolinoFan commented 6 months ago

After confirming that this is not because something stupid I am doing and reading the Home documentation, I will close this issue. It is more than fair enough. I am sorry for even opening the issue without reading the documentation first. I must also say, SDRangel is an amazing piece of software, by far the best SDR software I can imagine. I will patiently wait for the software to be packaged by my Linux distribution.

Could you make an option for people to donate to the project? I would love to help but I am not nearly smart enough, so the least I could do is to donate.

srcejon commented 6 months ago

Well, there might be something worth investigating here. It sounded OK at 10e6 and DEC=4, but then audio starts dropping samples when DEC=2.

Setting DEC to 2 would presumably increase CPU load due to increased spectrum window FFT calculations - but you say on the forum the problem still occurs even if you have the spectrum window closed (Note: It appears the spectrum/waterfall continues to be calculated even when not visible).

I wouldn't expect changing DEC to affect the BFM demod too much, as that should still decimate to a much lower frequency anyway.

It looks like single core usage goes to 100%, suggesting your are core limited - but that's a little surprising.

srcejon commented 6 months ago

Thinking about it some more, when the DEC option is 2 or 4, that decimation is performed on the input device thread, with the remainder in the channelizer/baseband thread, so it's split between multiple cores. When DEC is 1, then all the decimation is in the channelizer/baseband thread and more data is moved through the FIFOs, so more likely to saturate a single core.

srcejon commented 6 months ago

Here's a profile, running USRP at 20MSa/s, DEC=4 with BFD demod, with 4K spectrum visible:

One thing that jumps out is a large % of total time (11%) is spent calculating log2 for the spectrum - much more than doing FFTs!

Summary:

USRPInputThread::callbackIQ - 31%
DSPDeviceSourceEngineWork/SpectrumVis::feed - 20%
Messaging/Synchronization - 16%
DownChannelizer::feed - 11%
UHD/libusb - 10%
Graphics - 7%

When DEC is set to 1:

log2f is 30% of CPU time! (log2f calls _fdlog on Windows). Changing spectrum from log to linear stops the audio from stuttering, and then the profile looks like:

DownChannelizer::feed - 36%
DSPDeviceSourceEngineWork/SpectrumVis::feed - 21%
USRPInputThread::callbackIQ - 7%
Messaging/Synchronization - 16%
UHD/libusb - 8%
Graphics: 8%

Add a couple of demods, and decimation can be > 50% of CPU time.

CocolinoFan commented 6 months ago

Hmmm, I think you are right. The issue seams to be from how multi-threading is done: https://youtu.be/xI1WgdkoxrI Some things will only use a single thread.

srcejon commented 6 months ago

So the log2f problem looks to be MSVC specific.

When converting a power to dB for the spectrum, SDRangel uses log2f, rather than the usual log10f:

void dbTest1()
{
    float ofs = 20.0f * log10f(1.0f / 1024);
    float mult = (10.0f / log2f(10.0f));
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        x_out1[i] = mult * log2f(x_in[i]) + ofs;
    }
}

void dbTest2()
{
    float denom = 1024.0 * 1024.0;
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        x_out2[i] = 10.0f * log10f(x_in[i] / denom);
    }
}

Using gcc, log2f roughly twice as fast:

dbTest1: 157us
dbTest2: 315us

However, with MSVC, it is 6x slower!

dbTest1: 2128us
dbTest2: 337us

srcejon commented 6 months ago

Given this seems quite important to performance, but accuracy isn't paramount, we should be able to use a faster approximation to log2f: E.g: https://www.vplesko.com/posts/replacing_log2f.html

Using remezLogDeg3 instead of log2f, with gcc gives:

dbTest1: 153us  - log2
dbTest2: 315us  - log10
dbTest3: 18us    - remezLogDeg3

And with MSVC:

dbTest1: 2204us  - log2
dbTest2: 343us  - log10
dbTest3: 84us    - remezLogDeg3

It's accurate to log2f to ~4 decimal places, so should be fine for the spectrum, and massively faster.

f4exb commented 6 months ago

@CocolinoFan what is the profiling tool you are using on the videos?

CocolinoFan commented 6 months ago

btop and recording with OBS sir.

CocolinoFan commented 6 months ago

I am sorry to report but I don't think the commit fixed the issue. https://youtu.be/iEqLudN4DVc It might work slightly better but is hard to tell if is just placebo. I just found out about profiling tools. Here is what I recorded with perf during the video: data.zip

srcejon commented 6 months ago

I wouldn't expect it to have done really. It was mainly to fix a problem on Windows. It will help a bit to reduce the overall load on Linux, but I suspect your problem is with a different thread, that is performing decimation / demod, not the thread displaying the main spectrum, which this patch was for.

srcejon commented 6 months ago

Could you make an option for people to donate to the project? I would love to help but I am not nearly smart enough, so the least I could do is to donate.

Thanks. I obviously do it for fun and I can't speak for @f4exb, but perhaps if we did have a donation page, we could put any funds towards buying SDRs we don't currently support / have for testing, or something similar?

Also, there are some web hosting costs. We could spend some on Continuous Integration with builds for more OSes, code signing, or APIs to get some more interesting data. Perhaps a poll on the donation page?

Not sure if there's quite a big enough user base likely to donate though. Although having said that, looks like SDR++ author gets $450 / month on patereon though.

srcejon commented 6 months ago

Another little performance problem I noticed, although not related to the above, is in GLSpectrumView::paintGL when annotations are enabled. Render time can go from ~150us to 4ms due to the annotations (if we have quite a few displayed).

Looks like most of the time is in drawTextOverlay(). We presumably should create individual textures that are only drawn once, rather than on each paintGL call.

f4exb commented 6 months ago

Could you make an option for people to donate to the project? I would love to help but I am not nearly smart enough, so the least I could do is to donate.

I agree with @srcejon that if there is any donation it should go to this project and not to ourselves.

srcejon commented 6 months ago

it is working just fine with other SDR software.

A quick look at some other software, suggests they are taking a different approach to decimation. Rather than a chain of half-band filters, they use a regular FIR in the first stage with a similar number of taps, but with higher decimation factor, which reduces the total number of stages. However, looking at the filter response, although stopband attenuation is higher, the passband isn't anywhere near as flat as with the half-band filters. This doesn't matter if you're demodulating strong FM signals, but not as good IMO for weak signals.

Given that we now have support for GPU accelerated FFT, we could possibly try decimating using FFTs instead, which would mean we could use a much larger number of taps and better freq response.

CocolinoFan commented 6 months ago

First, I would not have donated that much, probably 10 pounds. I am on Universal Credit (people in the UK will know).

I agree with @srcejon that if there is any donation it should go to this project and not to ourselves.

Makes sense. But on the other hand the project is made by people. What if, every month half of the donations are kept in a fund to be spent strictly for the project, treating the project like a company, its own entity. The project decides it's in its best interest if X contributor buys a certain SDR or that is best to buy a certain web hosting service. And the other half is split between, let's say top 5 contributors to do as they wish, like a salary. Obviously this will need to be thought out a bit so people don't game the system, like having 100 commits correcting grammatical errors.

f4exb commented 6 months ago

As you mention dealing with money generates a bunch of issues that I am not sure we are willing to deal with.

f4exb / sdrangel

Audio stuttering and crashes at high bandwidth #2004