f4exb / sdrangel

SDR Rx/Tx software for Airspy, Airspy HF+, BladeRF, HackRF, LimeSDR, PlutoSDR, RTL-SDR, SDRplay and FunCube
GNU General Public License v3.0
2.92k stars 441 forks source link

ADS-B correlator optimization #675

Closed f4exb closed 3 years ago

f4exb commented 3 years ago

I have noticed a high CPU usage on a single thread when running the ADS-B demodulator. After a closer look at the code in adsbdemodsink.cpp it seems the sync sequence correlation could be optimized and I have a fix that seems to work pretty well.

Firstly there is no need to do a sqrt of the channel power. Just use the channel power (squared magnitude) but this does not yield much improvement since sqrt is already highly optimized.

In fact the heart of the issue is in the correlation loop that is run at every sample. It uses the canonical way for correlation with a multiply and accumulate at each step. Moreover it adds or subtracts depending on the "polarity" of the expected bit pattern. I think we could simply stay with positive values and adjust the correlation threshold in consequence. Also the sync pattern is part of the ADS-B standard and will not change over time so hardcoding it is not an issue. With this in mind the loop can be unrolled for the most part and only the magnitude in the expected signal presence bins be added. Thus the loop length is reduced to the number of samples per chip which is only a few units depending on the sampling frequency used.

This also means that the threshold is now given in channel power units (dB) and it makes it easier to link it to the observed channel power helping user to find the right threshold faster.

Now while correlation can be improved the very reason of the CPU burn is that if the threshold is too low then frame processing is attempted at every new correlation and since this is not a light process (can't see for now how it can be lighter) it starts to build up. Anyway if the threshold is set appropriately the way ADS-B works will cause busy and quiet periods so another optimization (this will be another issue) would be to process the frame off from a queue on a different thread. And then the queue length could be controlled to limit the build up at the expense of dropping frames. But it is better to drop these frames than I/Q frames and ruin the decoding process altogether.

f4exb commented 3 years ago

The most significant part is replacing this:

        // Correlate received signal with expected preamble
        Real preambleCorrelation = 0.0f;
        for (int i = 0; i < ADS_B_PREAMBLE_CHIPS*m_samplesPerChip; i++)
            preambleCorrelation += m_preamble[i] * m_sampleBuffer[startIdx+i];

by this:

        // Correlate received signal with expected preamble
        // chip+ indexes are 0, 2, 7, 9
        Real preambleCorrelation = 0.0;

        for (int i = 0; i < m_samplesPerChip; i++)
        {
            preambleCorrelation += m_sampleBuffer[startIdx + 0*m_samplesPerChip + i];
            preambleCorrelation += m_sampleBuffer[startIdx + 2*m_samplesPerChip + i];
            preambleCorrelation += m_sampleBuffer[startIdx + 7*m_samplesPerChip + i];
            preambleCorrelation += m_sampleBuffer[startIdx + 9*m_samplesPerChip + i];
        }

And thus work on the power (always positive) values for the threshold. To accommodate the variety of signal powers due to the Rx chain in various hardware this has to be exposed to the user in dB units.

srcejon commented 3 years ago

Removing sqrt sounds like a good idea, as is the multithreading, if needed. However, I don't think your correlation optimisation is correct. It looks like it is ignoring the magnitude of the 0 chips. I.e. 1,1 for the first bit would give the same correlation as 1,0, so it's not really matching against the preamble. A continuous high magnitude input would I think match.

f4exb commented 3 years ago

Yes I see. OOK is a pain for this. It is not like BPSK where + and - symbols can be clearly identified. Here you just have some signal or not and the "not" cannot contribute positively (by being reversed) to the correlation at least as is. In fact the power in the "zero" chips have to be accumulated as well but this time tested to be below a certain threshold. There are 4 "presence" chips and 12 "absence" chips so I suppose the same threshold can be used with a 1:3 relationship.

f4exb commented 3 years ago
        // Correlate received signal with expected preamble
        // chip+ indexes are 0, 2, 7, 9
        Real preambleCorrelationUp = 0.0;
        Real preambleCorrelationDown = 0.0;

        for (int i = 0; i < m_samplesPerChip; i++)
        {
            preambleCorrelationUp   += m_sampleBuffer[startIdx +  0*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx +  1*m_samplesPerChip + i];
            preambleCorrelationUp   += m_sampleBuffer[startIdx +  2*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx +  3*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx +  4*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx +  5*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx +  6*m_samplesPerChip + i];
            preambleCorrelationUp   += m_sampleBuffer[startIdx +  7*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx +  8*m_samplesPerChip + i];
            preambleCorrelationUp   += m_sampleBuffer[startIdx +  9*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx + 10*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx + 11*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx + 12*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx + 13*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx + 14*m_samplesPerChip + i];
            preambleCorrelationDown += m_sampleBuffer[startIdx + 15*m_samplesPerChip + i];
        }

Then you compare preambleCorrelationUp and preambleCorrelationDown in opposite ways posibly adding some "hysteresis" gap.

If this is enough only indexes 1, 3, 6 and 8 can be calculated for the "down" part to save a bit of processing (checking only active symbols). Otherwise this should be pretty robust.

srcejon commented 3 years ago

Thanks, I'll give it a try. There's a few other obvious optimisations that can be made too that I'll look at. Just finishing off a few updates to the GUI first.

f4exb commented 3 years ago

Well I have started doing something so if you let me commit first you will see how it looks like

f4exb commented 3 years ago

done!

f4exb commented 3 years ago

By the way 2 MS/s works perfectly. Here I have decimated the LimeSDR input of 4MS/s by two. I suppose I could use a RTL-SDR at 2 MS/s without decimation as well. I have as many decodes as with 4 MS/s: image

srcejon commented 3 years ago

Ok, good. I haven't tried to do any sort of analysis at all yet. Pretty decent range you seem to be getting there though!

However, there's some suspicious looking lats&longs in that image. I've found and fixed a couple of bugs in the decoder that was incorrectly interpreting TIS-B frames, which could be causing it. I should have a patch for that, some more performance improvements and a few extra features next week.

f4exb commented 3 years ago

There is usually a good propagation over the sea and 300+ km range is not rare. Moreover I am using a logper antenna ~700-2600 MHz range towards this direction. I also noticed what seems corrupted positions. I haven't tried to investigate and since you have a fix for this I'll let you do it. I will commit shortly the 2 MS/s addition plus moving average of current correlation data so that it does not move so fast that it is hard to follow.

f4exb commented 3 years ago

Implemented in v4.21.1 and 5.15.1