espressif / esp-dsp

DSP library for ESP-IDF
Apache License 2.0
442 stars 87 forks source link

dsps_fir_f32_aes3 is not implemented (DSP-79) #44

Closed zjanosy closed 11 months ago

zjanosy commented 2 years ago

In v1.2.0-1-g07aa7b1 of the esp-dsp some functions have an aes3 version, optimized for the ESP32-S3, using 128-bit vector instructions. The dotproduct is one of them. There is also a dsps_fir_f32_aes3.S file, but this is apparently not finished. Is there a chance to have an S3-optimized version of it anytime soon?

dmitry1945 commented 2 years ago

Hi @zjanosy , I will look to the dsps_fir_f32_aes3.S I think you will get it next week.

zjanosy commented 2 years ago

@dmitry1945 That would be awesome, thank you! I know this has been asked many time, but is there a chance to get some -- even a very preliminary -- documentation about the vector instruction extensions? We are currently evaluating the ESP32-S3 for audio signal processing products, and we would need to optimize our DSP algorithms to see if it would be feasible to move to this platform. Thanks, -Zoltan

zjanosy commented 2 years ago

@dmitry1945 Is there any progress on the S3-optimized FIR routine? And on the S3 instruction set documentation? Thanks, Zoltan

dmitry1945 commented 2 years ago

Hi @zjanosy the FIR is still in progress. Hope this week will have time for that. And about instruction set I have ask colleges. Will inform you when will get the answer. Thanks. Dmitry

dmitry1945 commented 2 years ago

@zjanosy about documentation. The instruction set will be available with TRM soon. The docu is still in progress that;s why it's not published yet. If you need instruction set for some projects now and don't whant to wait a little, please let me know. I think we will find the solution.

Regards, Dmitryt PS. FIR is still in progress.

zjanosy commented 2 years ago

@dmitry1945 We are currently evaluating the ESP32-S3 DSP capabilities, so I would be thankful if we could get a preliminary document. A(z) Galaxy eszközömről küldve -------- Eredeti üzenet --------Feladó: dmitry1945 @.> Dátum: 2022. 05. 05. 19:26 (GMT+01:00) Címzett: espressif/esp-dsp @.> Másolatot kap: Zoltan Janosy @.>, Mention @.> Tárgy: Re: [espressif/esp-dsp] dsps_fir_f32_aes3 is not implemented (Issue #44) @zjanosy about documentation. The instruction set will be available with TRM soon. The docu is still in progress that;s why it's not published yet. If you need instruction set for some projects now and don't whant to wait a little, please let me know. I think we will find the solution. Regards, Dmitryt PS. FIR is still in progress.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

zjanosy commented 2 years ago

@dmitry1945 Hi Dmitry,

I've sent you an email on the 10th May regarding how to get the preliminary documentation, but got no reply. Is there any update on the release date, or from whom shall I request it?

I understand that the S3-optimized FIR filter is not a priority, so I would like to try implementing it myself, but without the documentation it is a bit more challenging ;-) The outcome of our evaluation depends on this particular piece of code, because the current implementation does not make it.

Thanks, Zoltan

igrr commented 1 year ago

@zjanosy Just a note, the ESP32-S3 TRM now includes the documentation of the instruction extensions: https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf

zjanosy commented 1 year ago

Thanks @dmitry1945, I have subscribed for the documentation notifications, so I have downloaded it already.

Actually I got the preliminary documentation earlier, and I could implement the optimized floating point FIR filter using a 4x4 block FIR algorithm. It runs at 1.54 cycles/tap with the only restriction that the FIR length and the block size must be multiples of 4.

In comparison, the original "ANSI-C" implementation in the library needed 7.56 cycles/tap, and even the "ESP32-optimized" implementation needed 4.04 cycles/tap. So 1.54 cycles/tap is a significant improvement, almost comparable to a dedicated DSP. I was quite happy with the result, because it proved that it is possible use the ESP32 for DSP.

Besides using the new 128-bit parallel load/save instructions, the optimal ordering of multiply/add instructions had a huge impact on the speed. This is because the LX7 has a pipelined architecture. To avoid pipeline stalls an instruction should not use a register as input which was used for output in the previous instruction.

I have also implemented an optimized biquad-cascade filter section. An 8 stage filter runs at 120 cycles/sample, whereas the original "ESP32-optimized" version in the library needed 154 cycles/sample. This is not a huge improvement, but it comes for free. The only drawback is that the data structure I used is not compatible with the other biquad implementations in the ESP-DSP library.

Best regards, Zoltan

dmitry1945 commented 11 months ago

Hi @zjanosy now the FIR filter and FIR filter with decimation for floating point in esp32s3 are implemented.

Thank you and Best regards, Dmitry