akellyirl / AutoCorr_Freq_detect

An Arduino example of Autocorrelation for the detection of signal frequency
9 stars 7 forks source link

Optimization ideas #1

Open Defragster opened 7 years ago

Defragster commented 7 years ago

I just came across this code and thought it really cool and found it to work on my 32 bit STM32L4 processor as written against the provided sample data files.

I wondered how fast it was and surprised to see on the 80 MHz processor it took 144ms to find the answer. Then I noticed the nested loop is running n-Squared across the whole sound sample! I can't guess how long it would take on an 8 bit 16 MHz - running the STM32 at 24 MHz the code takes 354 ms on the on sample.

Looking at optimizations I found two things the first biggest thing is to remove the "/256" division as the sum is accumulated. This results in the same answer as the intermediate result is changed in magnitude but returns the same calculation in the end for "freq_per=" and takes about 40% less time at 100 ms: for(k=0; k < len-i; k++) sum += (rawData[k]-128)*(rawData[k+i]-128); // REMOVE THIS :: /256;

The second thing involves the repeated '-128' on that same line, as it is done on both values each time. On my STM32 I have enough RAM that I did this before entering the existing "for(i = 0" loop:

int16_t summs[sizeof(rawData)]; for (k = 0; k < len; k++) summs[k] = rawData[k] - 128; // x32 device works best with int16

Then the 144 ms becomes 65 ms when I use this modified line as the inner loop: for (k = 0; k < len - i; k++) sum += ( summs[k] * summs[k + i] );

If RAM is too short for that second array the original can be modified like this before the "for(i = 0" loop: int8_t *SrawData = (int8_t *)rawData; for (k = 0; k < len; k++) SrawData[k] = rawData[k] - 128;

Then using this inner loop - which is not optimal on a 32 bit processor but should work well on an 8 bit processor: for (k = 0; k < len - i; k++) sum += ( SrawData[k] * SrawData[k + i] );

Defragster commented 7 years ago

This would apply the same to the 'Arduino-Guitar-Tuner ' code - I'd be interested to see notes is you put a timer on your Arduino and tested before and after like I did.

` uint32_t timmer = micros(); // nest for loops here timmer = micros() - timmer; Serial.print("timmer us = "); Serial.println(timmer);

`

Defragster commented 7 years ago

In looking at this to find out if it could run in less time I looked at Early Exit, and big the sample count had to be.

Once pd_state = 3 - it won't change - so 'break'. On "C4.h" it cuts run time by 5.9 times - on a 240 MHz Teensy 3.6 a full cycle after going to state 3 takes 53 ms, exiting when hitting state==3 has the needed period in 8.937 ms. On "AltoSaxVib_C4.h" 57.9 88 ms would drops to 9.463 ms

Using the three sample files on github using only 200 for 'len' the same period value is found. I suppose the needed min len is a factor of sample_freq and the freq to be detected, but on those three github samples 250 was enough.

sample name : full sample time , full early exit time, 250=len time, 250=len early exit time "Guitar_C5.h" : 58, 4.9, 3.7, 1.1 "C4.h" : 53, 8.9, 3.7, 2.0 "AltoSaxVib_C4.h" : 57.9, 9.4, 3.7, 2.0

For reference on the 240 MHz Teensy 3.6 the original code took: 93.592 ms on "Guitar_C5.h". 85.499 ms on "C4.h" 93.411 ms on "AltoSaxVib_C4.h"

I hope you find this feedback useful and correct. I appreciate your shared code! I wanted to speed this up for testing DMA reads of an analog microphone. This will allow me to take samples and test them at the same time the next sample group completes. This code will be much easier to test with than the FFT I want to do in the end.

Attached is the code I tested with to get the numbers above. AcorrB.ino.txt

Defragster commented 7 years ago

Here is my fork: https://github.com/Defragster/AutoCorr_Freq_detect

I posted a set of results in the README

akellyirl commented 7 years ago

These modifications for speed look really good. A big improvement for applications that require low latency. Thanks.

Defragster commented 7 years ago

Are you running on 8 bit AVR hardware? I have plenty but not used in a while. It would be interesting to see the times it takes there. I started with 32 bit Teensy boards and now also a low power 32 bit STM board and haven't looked at much else since I started with the Digistump boards.

This method seems to be very effective - I'll test it against a Piano when I get my microphone connected. It is amazing those simple lines of code - nested for loops - can take so long.

In my fork of your github reading the text I noticed the last digits didn't match on the 'C4' you show 259.91Hz and I get 259.41 using your code and mine on he Teensy.