Open f4lc0n-asm opened 2 months ago
Thanks for your contribution! We will test it as soon!
Well, I just had a look at the actual FP32 FIR sources on ESP-DSP - there are notable changes like modified fir_f32_t in dsps_fir.h and decimation FIR ANSI-C source dsps_fird_f32_ansi.c. Both changes for the worse - making the FIR less flexible (e.g. no adjustable output delay and no arbitrary input length). Please compare the changes with the dsps_fir.h and dsps_fird_f32_ansi.c I included in my post - they are older and preferable. I also use uint instead of int in fir_f32_t - much more logical because of what info the struct members actually store… To conclude, dmitry1945 et al. somehow overthought it…
I just use ESP-DSP as an inspiration, always producing my own code, which suits me best! :)
Should you need any changes to my code to use it in ESP-ADF, just ask here!
P.S.: Looking at the new dsps_fird_f32_ansi.c - just think what happens at https://github.com/espressif/esp-dsp/blob/b3841d696950b2591cd84c94a0494c724a9f322e/modules/fir/float/dsps_fird_f32_ansi.c#L22 when e.g. input length=10 and fir->decim=20 - input++ loads invalid data after 10-th for-loop iteration… In order to proceed correctly, the input length must be exactly len fir->decim or *len 20** samples in my example. This is very non-flexible!
I think the issue of fixed input length can be resolved by adding a ring buffer when processing data. You're right, everyone's own code is best suited for themselves!
The way they programmed the new decimating FIR with fixed input length, it requires the FIR coefficients to be in reverse order because of coeff_pos++
starting with the oldest member of the delay line in the FIR computation loops - see https://github.com/espressif/esp-dsp/blob/b3841d696950b2591cd84c94a0494c724a9f322e/modules/fir/float/dsps_fird_f32_ansi.c#L30
By simple tweaking of the non-decimating FIR, I created a decimating one with fixed input length, but with normal FIR coefficient order, which is much more convenient. I added it to the original post (the fird2_fp32 subdir). Take it or leave it! :)
That's great! If you want to add this code to the DSP repository, I recommend submitting another issue in ESP-DSP to let them know.
When I posted the 1½cycle/tap fp32 FIRs for ESP32-S3 here, which require FIR coefficients in normal order and accept any FIR length (including prime numbers), I also posted them in ESP-DSP. But they, without explaining anything, chose their solution, which requires FIR coefficients in reverse order and the FIR length must be a multiple of 4. Then they changed the fp32 FIRs for ESP32 to require the FIR coefficients in reversed order, too. Are they afraid of my competetive offer? As you suggested, I posted the 3cycle/tap fp32 FIRs for ESP32, which require FIR coefficients in normal order, to ESP-DSP. Let's see how they will react…
Hello,
these 3-cycle/tap ASM FIR routines for ESP32(-S3) are drop-in replacements for the old 4-cycle/tap ones for ESP32. Just add preprocessor directives if needed. Turn off Task WDT before running the validation tests! Benchmarks in the format: _FIR_Length: new_ASM/oldASM/C in cyc/tap (basically the number of cycles in a FIR loop for long FIRs):
Cheers!
f4lc0n
FIR_fp32_3c_per_tap_v2.0.zip (7-Zip) Added: decimating FIR with fixed input data length and with normal FIR coefficient order Fixed: implemented C function prototype in the assembly Fixed: C src code leftovers