SOLVED: An ESP32(-S3) 3-cycle/tap fp32 FIR filter of arbitrary length and decimation (AUD-5626)

f4lc0n-asm commented 2 months ago

Hello,

these 3-cycle/tap ASM FIR routines for ESP32(-S3) are drop-in replacements for the old 4-cycle/tap ones for ESP32. Just add preprocessor directives if needed. Turn off Task WDT before running the validation tests! Benchmarks in the format: _FIR_Length: new_ASM/oldASM/C in cyc/tap (basically the number of cycles in a FIR loop for long FIRs):

 11: 6.05/6.47/17.10 |  21: 4.61/5.24/15.36 |   51: 3.66/4.50/14.26 |  101: 3.34/4.23/13.88
201: 3.17/4.12/13.69 | 501: 3.07/4.05/13.58 | 1001: 3.04/4.03/13.54 | 2001: 3.02/4.01/13.53

Cheers!

f4lc0n

FIR_fp32_3c_per_tap_v2.0.zip (7-Zip) Added: decimating FIR with fixed input data length and with normal FIR coefficient order Fixed: implemented C function prototype in the assembly Fixed: C src code leftovers

hbler99 commented 2 months ago

Thanks for your contribution! We will test it as soon!

f4lc0n-asm commented 2 months ago

Well, I just had a look at the actual FP32 FIR sources on ESP-DSP - there are notable changes like modified fir_f32_t in dsps_fir.h and decimation FIR ANSI-C source dsps_fird_f32_ansi.c. Both changes for the worse - making the FIR less flexible (e.g. no adjustable output delay and no arbitrary input length). Please compare the changes with the dsps_fir.h and dsps_fird_f32_ansi.c I included in my post - they are older and preferable. I also use uint instead of int in fir_f32_t - much more logical because of what info the struct members actually store… To conclude, dmitry1945 et al. somehow overthought it…

I just use ESP-DSP as an inspiration, always producing my own code, which suits me best! :)

Should you need any changes to my code to use it in ESP-ADF, just ask here!

P.S.: Looking at the new dsps_fird_f32_ansi.c - just think what happens at https://github.com/espressif/esp-dsp/blob/b3841d696950b2591cd84c94a0494c724a9f322e/modules/fir/float/dsps_fird_f32_ansi.c#L22 when e.g. input length=10 and fir->decim=20 - input++ loads invalid data after 10-th for-loop iteration… In order to proceed correctly, the input length must be exactly len fir->decim or *len 20** samples in my example. This is very non-flexible!

hbler99 commented 2 months ago

I think the issue of fixed input length can be resolved by adding a ring buffer when processing data. You're right, everyone's own code is best suited for themselves!

f4lc0n-asm commented 2 months ago

The way they programmed the new decimating FIR with fixed input length, it requires the FIR coefficients to be in reverse order because of coeff_pos++ starting with the oldest member of the delay line in the FIR computation loops - see https://github.com/espressif/esp-dsp/blob/b3841d696950b2591cd84c94a0494c724a9f322e/modules/fir/float/dsps_fird_f32_ansi.c#L30 By simple tweaking of the non-decimating FIR, I created a decimating one with fixed input length, but with normal FIR coefficient order, which is much more convenient. I added it to the original post (the fird2_fp32 subdir). Take it or leave it! :)

hbler99 commented 1 month ago

That's great! If you want to add this code to the DSP repository, I recommend submitting another issue in ESP-DSP to let them know.

f4lc0n-asm commented 1 month ago

When I posted the 1½cycle/tap fp32 FIRs for ESP32-S3 here, which require FIR coefficients in normal order and accept any FIR length (including prime numbers), I also posted them in ESP-DSP. But they, without explaining anything, chose their solution, which requires FIR coefficients in reverse order and the FIR length must be a multiple of 4. Then they changed the fp32 FIRs for ESP32 to require the FIR coefficients in reversed order, too. Are they afraid of my competetive offer? As you suggested, I posted the 3cycle/tap fp32 FIRs for ESP32, which require FIR coefficients in normal order, to ESP-DSP. Let's see how they will react…

espressif / esp-adf

SOLVED: An ESP32(-S3) 3-cycle/tap fp32 FIR filter of arbitrary length and decimation (AUD-5626) #1251