dawsonjon / PicoRX

Build a SDR SW/MW/LW Receiver with a Raspberry Pi Pico
MIT License
235 stars 30 forks source link

Does it make sense to do IQ imbalance correction after decimation? #136

Open mryndzionek opened 2 weeks ago

mryndzionek commented 2 weeks ago

Just a general RF DSP question. I doubt we'll be able to do IQ imbalance correction at 500ksps, especially on RP2040, but does it make sense to do this after decimation?

dawsonjon commented 2 weeks ago

Yes, it still makes sense after decimation. I've been meaning to do some measurements of mirror suppression, so we can do a before and after.

mryndzionek commented 2 weeks ago

Yes, it still makes sense after decimation. I've been meaning to do some measurements of mirror suppression, so we can do a before and after.

What exactly needs to be done? Maybe I could help.

dawsonjon commented 2 weeks ago

I think the best way to measure the image rejection is to tune in a strong signal (as strong as possible without saturating the op-amp), note the power in dBm, then temporarily turn on "swap IQ" and see how much the signal is attenuated. Whenever I have tried this, the signal has disappeared completely below the noise floor, which suggests that the image rejection is already quite good. As a sense check, I temporarily disconnected the I or Q input from the ADC, and the image signal isn't rejected at all (as expected). To get an accurate measurement I was going to try injecting a signal with a signal generator to give the largest possible signal, with no noise from the antenna. Using CW mode with the narrowest bandwidth setting will reduce the noise floor as far as possible. I was then going to repeat the measurement over a few different bands to see if there was any drop-off in performance as the frequency increases.

I was thinking of implementing something along the lines described here: https://github.com/df8oe/UHSDR/wiki/IQ---correction-and-mirror-frequencies. They achieved an image rejection of 60-65dB by implementing IQ imbalance correction.

On Fri, 11 Oct 2024 at 12:51, mryndzionek @.***> wrote:

Yes, it still makes sense after decimation. I've been meaning to do some measurements of mirror suppression, so we can do a before and after.

What exactly needs to be done? Maybe I could help.

— Reply to this email directly, view it on GitHub https://github.com/dawsonjon/PicoRX/issues/136#issuecomment-2407248604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFPFX4TVRZXVQWEWLJBYJDZ263SJAVCNFSM6AAAAABPYSQEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBXGI2DQNRQGQ . You are receiving this because you commented.Message ID: @.***>

mryndzionek commented 2 weeks ago

Thank for for the detailed explanation. So the idea is to have the correction coefficients fixed (or have few for different band)? I was thinking about constant software estimation and compensation. I was looking at this, but even computing all the needed moving averages seems to be too expensive in our case.

dawsonjon commented 2 weeks ago

I was thinking of implementing automatic correction similar to the method you linked. The Mosley paper they followed on the uhsdr project describes an efficient method of estimating the imbalance and calculating the correction coefficients. It basically uses accumulators to estimate the phase and magnitude differences between I and Q. We wouldn't need to calculate the correction coefficients very often, once every few blocks would be plenty. Applying the correction should be achievable if we apply it after decimation, it should be slightly simpler than the existing frequency shift, only a couple of multiplies and adds for each sample.

mryndzionek commented 2 weeks ago

I was thinking of implementing automatic correction similar to the method you linked. The Mosley paper they followed on the uhsdr project describes an efficient method of estimating the imbalance and calculating the correction coefficients. It basically uses accumulators to estimate the phase and magnitude differences between I and Q. We wouldn't need to calculate the correction coefficients very often, once every few blocks would be plenty. Applying the correction should be achievable if we apply it after decimation, it should be slightly simpler than the existing frequency shift, only a couple of multiplies and adds for each sample.

I managed to write something that is efficient enough and might actually work. The computed values seem to be stable and close to what is expected (c1 small, close to zero, c2 close to one). Just listening to stations it's hard to say what the effect is and I don't have equipment to test. The code is here if you're interested.

dawsonjon commented 2 weeks ago

Nice! Looks like a very efficient implementation. I will try some tests and see if it improves the performance.

mryndzionek commented 2 weeks ago

Nice! Looks like a very efficient implementation. I will try some tests and see if it improves the performance.

Here is a branch with some more adjustments and UI controls for the correction: https://github.com/mryndzionek/PicoRX/tree/iq_imb6

dawsonjon commented 2 weeks ago

Something not quite right at the moment, enabling IQ correction is making the images worse rather than better. I expect its probably a simple fix somewhere. For interest, I have done some baseline measurements (a bit crude, only a single data point at each frequency) of the image rejection with no correction: 42dB@1.8MHz, 42dB@3.5MHz, 34dB@7MHz, 37dB@14MHz, 38dB@28MHz. I'm excited to see how much improvement we can get with IQ correction.

mryndzionek commented 2 weeks ago

Something not quite right at the moment, enabling IQ correction is making the images worse rather than better. I expect its probably a simple fix somewhere.

I think we're running out of int32_t range on some accumulator variables...

mryndzionek commented 2 weeks ago

I added one more commit to my branch. Now it should be less obviously worse after enabling correction :smile: (I still don't see a clearly positive effect).

mryndzionek commented 2 weeks ago

On this branch I added some more adjustment and first order dc-block on IQ signals. This seems to have some positive effect and it's mostly due to dc-blocking, not the imbalance corrector.

mryndzionek commented 2 weeks ago

Here is the final version. The impact is hard to judge. It's at least not obviously bad :wink: I'm not sure I'll be able to come up with something better.

dawsonjon commented 2 weeks ago

Cool, testing it now!

mryndzionek commented 2 weeks ago

I have made a simple (and fixed point) simulation in Python: Screenshot at 2024-10-15 19-29-57

So yeah, it works. At least for clean IQ signals.

dawsonjon commented 1 week ago

I tried a few tests with the iq_corr_mosely branch, but I still couldn't see any improvements to the image rejection. I have spent a bit of time trying to see if I could get it working. I wasn't able to put my finger on the issue, but I worked through testing each stage in turn and I came up with this.

It seems to work very well, improving the image rejection by at least 10-20dB, but it is now hard to measure the image rejection because the images are either below the noise floor, or too small to measure.

I really like your idea of putting the DC removal after the decimation, it makes it much faster, and works with I and Q individually. It turns out that the cic decimation doesn't care about the DC offset, so I was able to take out the earlier DC removal on the ADC inputs which saves a significant amount of CPU. I was also able to remove the rounding from the cic decimator which is now redundant, although the effect was much less noticeable.

mryndzionek commented 1 week ago

Looks and works great! I would say shipit :shipit:

mryndzionek commented 1 week ago

One more thing. I've been playing with CIC compensation filters. FIR needs at least 15 taps, so too expensive (gets me to 130% CPU load). However I managed to derive an IIR filter. Here is the implementation:

void __not_in_flash_func(rx_dsp ::comp_filt)(int16_t &i, int16_t &q)
{
#define COMPF_B0 (-89274L)
#define COMPF_A1 (42593L)
#define COMPF_A2 (13914L)

  static int32_t i_yprev = 0;
  static int32_t q_yprev = 0;
  static int32_t i_ypprev = 0;
  static int32_t q_ypprev = 0;

  i = ((COMPF_B0 * i) - (COMPF_A1 * i_yprev) - (COMPF_A2 * i_ypprev)) >> 15;
  q = ((COMPF_B0 * q) - (COMPF_A1 * q_yprev) - (COMPF_A2 * q_ypprev)) >> 15;

  i_ypprev = i_yprev;
  q_ypprev = q_yprev;
  i_yprev = i;
  q_yprev = q;
}

Seems to make spectrum flatter and boost the tuned signal by ~6dBm. What do you think?

dawsonjon commented 1 week ago

That's a very neat implementation. I always struggle a bit with IIR filters, especially in fixed point. Would be very interested to see how this was derived.

It's fairly easy to compensate for the curve of the CIC filter in the frequency domain, I have tried this and it works well. There is a Python simulation of the CIC filter, I just took the reciprocal of this to work out the gain for each frequency point in the FFT. I then simply scale each frequency bin by the appropriate gain in the FFT filter. For an efficient implementation, it's only necessary to scale the frequency bins in the pass band. The whole spectrum is also flattened using this technique, but it is only necessary to apply the correction once, each time the display is updated.

I have a branch that includes most of this, I will push it and post a link when I get a spare moment.

mryndzionek commented 1 week ago

That's a very neat implementation. I always struggle a bit with IIR filters, especially in fixed point. Would be very interested to see how this was derived.

The Python script is very messy. I used my own least squares FIR design function and Prony's method do derive the IIR. I can only show frequency responses for now: iir_comp

mryndzionek commented 1 week ago

The only potential problem might be nonlinear phase response: phase

If I remember correctly we're tuning around 6kHz, so still in linear range.

mryndzionek commented 1 week ago

Regarding moving stuff to frequency domain, is this how frequency shifting supposed to be done?

dawsonjon commented 1 week ago

Yes, that's right the frequency shift is effectively rotating the spectrum in the frequency domain. The difficulty comes when we need a resolution finer than 1 bin.

Its quite unusual to see frequency shifting implemented in the frequency domain. I think this is because frequency shifting is actually slower in the frequency domain than the time domain.

When we are filtering, a convolution operation in the time domain becomes a multiplication operation in the frequency domain, dramatically reducing the number of operations (even after the FFT and IFFT are taken into account).

When we are frequency shifting, the same principle works against us, a multiplication operation in the time domain becomes a convolution operation in the frequency domain, increasing the number of operations.

mryndzionek commented 1 week ago

But how exactly is it slower? We are just shuffling memory without multiplications. Or are you talking about the resolution finer than 1 bin case?

Would be good to have a more specific TODO list of things that need to be moved to frequency domain.

dawsonjon commented 1 week ago

Yes, I meant if it was finer than 1 bin. I'm not sure if there is anything existing that needs to be moved to the frequency domain, but we can exploit it when we add new features.

One thing I would like to implement is frequency based noise reduction. Essentially this works like having a squelch function for each frequency bin that blanks "noise" below a threshold. The difficult bit is estimating the noise level in each bin in order to set the thresholds, and to avoid distorting the signal.

mryndzionek commented 1 week ago

Actually regarding squelch, I would like to improve it. I made #140 and I think having this state machine and adjustable timeout makes sense, but what is missing is automatic squelch, without explicitly setting the threshold. I'm planning to explore this. Should be efficient even on floats.

dawsonjon commented 1 week ago

Yes, really like the idea of automatic squelch. I have pushed my TFT branch here. It is still a work in progress, but has support for a separate TFT waterfall, CAT control and a few other bits and pieces. I have implemented CIC correction in the frequency domain.

mryndzionek commented 1 week ago

Yes, really like the idea of automatic squelch. I have pushed my TFT branch here. It is still a work in progress, but has support for a separate TFT waterfall, CAT control and a few other bits and pieces. I have implemented CIC correction in the frequency domain.

What is the plan with TFT branch? Will it be merged to testing? I really would like to have CIC correction in testing. Should I cherry-pick?

dawsonjon commented 1 week ago

I think I have merged most features from testing (not the new squelch change yet), but I took a different path early on, so there are some differences in approach. I was thinking of making a new release based on the TFT branch (fairly) soon. I probably won't add any new features to it for now, but will focus on testing, documentation and bug fixes.

My plan was to keep using the testing branch as a place to try out and share new features and ideas, and incorporate the more stable changes into releases periodically. I could merge the TFT branch back into testing now, or I'm happy to cherry-pick, whichever you prefer.

mryndzionek commented 1 week ago

Will the TFT branch also support OLED displays and u8g2?

dawsonjon commented 1 week ago

Yes, should be completely backwards compatible. I have merged all the latest u8g2 oled code. The TFT is just an optional extra.

mryndzionek commented 1 week ago

Yes, should be completely backwards compatible. I have merged all the latest u8g2 oled code. The TFT is just an optional extra.

Okay, so I don't think it make sense merging back. I can wait. I'll be periodically testing the TFT branch. Good that the menu is now non-blocking, so better scheduling of "actions" is possible in picorx.cpp.

mryndzionek commented 1 week ago

Just a general question. In multiple sources I found information that after filtering in frequency domain, after ifft a window should be applied. Would it be beneficial in our case?

dawsonjon commented 5 days ago

That's a really good question, and I don't really know the answer. I have been doing a bit of reading up and it certainly seems like it can be beneficial in some circumstances. At the moment I'm quite happy with the way the frequency domain filtering is working, but it is something that I would like to learn a bit more about.

mryndzionek commented 5 days ago

Regarding this:

One thing I would like to implement is frequency based noise reduction. Essentially this works like having a squelch function for each frequency bin that blanks "noise" below a threshold. The difficult bit is estimating the noise level in each bin in order to set the thresholds, and to avoid distorting the signal.

I made some trials and it sort of works, but I'm basing this on FFT/IFFT after demodulation and this is too slow for RP2040, at least without overclocking. Any ideas on how to do this before demodulation?

dawsonjon commented 5 days ago

I think we could re-use the FFT filter so long as we are working in SSB or AM mode, provided that the granularity of the frequency bins is fine enough. I think the process would be broadly the same, blanking the bins that are below the "noise" threshold. The difference is that in AM mode we would have to apply the correction symmetrically to both sidebands.

If we implement this after demodulation, there might be some tricks we can employ to speed things up a bit, the bandwidth of the audio signal after demodulation is small enough that we could decimate by another factor of 2. This should be fairly trivial since there are no signals in the outer half of the spectrum we can simply throw away every other sample without a risk of aliasing. We only need real rather than complex FFT/IFFT, so we could get another factor of 2 speed increase using the "2 for the price of one" algorithm.

On Thu, 24 Oct 2024 at 18:56, mryndzionek @.***> wrote:

Regarding this:

One thing I would like to implement is frequency based noise reduction. Essentially this works like having a squelch function for each frequency bin that blanks "noise" below a threshold. The difficult bit is estimating the noise level in each bin in order to set the thresholds, and to avoid distorting the signal.

I made some trials and it sort of works, but I'm basing this on FFT/IFFT after demodulation and this is too slow for RP2040, at least without overclocking. Any ideas on how to do this before demodulation?

— Reply to this email directly, view it on GitHub https://github.com/dawsonjon/PicoRX/issues/136#issuecomment-2436000103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFPFX3HWHRO2ZAITGFVVS3Z5EYA7AVCNFSM6AAAAABPYSQEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZWGAYDAMJQGM . You are receiving this because you commented.Message ID: @.***>

mryndzionek commented 3 days ago

provided that the granularity of the frequency bins is fine enough.

Yeah, so I think this will be a problem. I had to use a STFT, a 128-bin FFT on two consecutive 64 frames of sample, at the very leas to get decent result. I think that having a STFT with even smaller stride would be beneficial.

dawsonjon commented 3 days ago

I had to use a STFT, a 128-bin FFT on two consecutive 64 frames of sample, at the very leas to get decent result.

The FFT filter is using an stft of 128 points on two consecutive frames of 64 samples.

I think that having a STFT with even smaller stride would be beneficial.

Did you think about x2 decimating after the demodulator? That would give you twice the resolution for the same FFT size, and twice as long to calculate it.

mryndzionek commented 3 days ago

The FFT filter is using an stft of 128 points on two consecutive frames of 64 samples.

Yes, but I think there is a difference, as the modulated signal is just a fraction of the spectrum.

dawsonjon commented 3 days ago

The output side of the filter is working at 15kHz sample rate, and so is the demodulator. There is no practical change to the bandwidth of SSB and AM signals during demodulation, and the frequency range and resolution of a 128 point FFT remains the same unless you reduce the sample rate. (Maybe you are doing this already?)

It's true that the signal is only occupying a fraction of the spectrum, a speech signal only uses a couple of kHz of bandwidth, but we would need to reduce the sample rate to exploit that.

I think that having another FFT/IFFT after demodulation would be a superior option (it would be more flexible, gives us the option to work at a lower sample rates, and works with FM too) provided we can make it fast enough to be viable.

mryndzionek commented 3 days ago

Did you think about x2 decimating after the demodulator? That would give you twice the resolution for the same FFT size, and twice as long to calculate it.

I'm not sure if I understand. Decimating 2x after demodulator means 15ksps->7.5ksps samplerate drop. Do we want that?

Today I tried to implement spectral subtraction using CMSIS-DSP, as it provides efficient RFFT functions etc. It is better from computational point of view, but still not enough. I'm almost to 100% load with just FFT/IFFT roundabout. Nevertheless I continued just to see what can be achieved. I also had to use float for the actual energy estimations, as fixed-point divisions were blowing the ranges. Good news is that even with floats the actual spectrum subtraction is not that demanding. FFT/IFFT are still the bottleneck. So in the end with FFT/IFFT from CMSIS-DSP, floating-point energy estimations and overclocking to 225-240MHz here is the result: denoising.wav.zip

Denoising is active between 12-28s and 40s till the end of file. Quite nice I would say. Now just to make it more efficient...

dawsonjon commented 3 days ago

Impressive results. What size FFT did you use in the end?

I'm not sure if I understand. Decimating 2x after demodulator means 15ksps->7.5ksps samplerate drop. Do we want that?

Yes, that was my thinking. The audio signal never uses more than 3.75kHz, so half of the FFT bins will be zero anyway. Then you can get the same results with half the FFT size, and code will run at least twice as fast.

mryndzionek commented 3 days ago

Impressive results. What size FFT did you use in the end?

Same as before, 128 on two consecutive frames. This time using arm_rfft_q15 which internally uses FFT64, I think.

mryndzionek commented 7 hours ago

No matter what I do with the noise canceller I'm over 100% CPU load. I started looking at other parts of the system and found one place for optimization. Using FFT from CMSIS-DSP seem to reduce the load by ~20%. I'll try create a PR in next day, or two. Might still be not enough, but probably half way there. I'll also need to move from floats to fixed-point. I need log10f and in seems to be hard to implement with fixed-point...

dawsonjon commented 4 hours ago

Cool, sounds interesting can't be far off now. Log10 shouldn't be too hard to approximate in fixed point (depending on accuracy). You can calculate log2, then change the base by multiplying by a constant. Log2 can be calculated by counting leading zeros for the integer part, then using a lookup table or polynomial approximation for the fraction part.

mryndzionek commented 2 hours ago

Yeah, so actually the log10 is for converting SNR to dB and the SNR range is fairly small, so a LUT will probably be enough! Meanwhile I've created another PR - #142 - just an optimization.