Openvario / variod

Daemon for autonomous e-vario
4 stars 8 forks source link

Floating point vs Fixed Point #31

Open hpmax opened 4 years ago

hpmax commented 4 years ago

Looking through the code, it appears there's a lot of highly repetitive code (like for synthesizing the waveform) which uses floating point numbers, for things that could probably work just as well for fixed point. Hopefully that should make it run faster and with less power consumption. Anyone have comments on this?

kedder commented 4 years ago

I don't think it is worth changing without some real measurements. Would you be able to detect any change in power consumption if you remove that repetitive code completely?

hpmax commented 4 years ago

I believe I properly optimized waveform synthesis while retaining floating point. Audio seemed unaffected, unfortunately so did the reported CPU utilization. So I'm not optimistic that switching to fixed point would yield much further improvement. Even so, I might be wrong and see no downside to my changes.

I'll branch it and people can take a look and see if the changes look better. With one minor exception, all changes were in audiovario.c I did introduce two number variables (pulse_risei, and pulse_falli) which are the pre-computed reciprocals of pulse_rise, and pulse_fall. I'm assuming floating point multiply is preferable over floating point divide. If not, that change was pointless.

hpmax commented 4 years ago

I set up a pull request on the code changes I made. Feel free to look it over.

hpmax commented 3 years ago

Okay, I've done some actual measurements on it. First off, I can tell you floating point divides are awful, they are about 5x worse than floating point multiplies. Interestingly, I can't tell floating point multiplies are worse than floating point add/subtract. But floating point if's don't seem too great. Besides that, it's surprisingly hard to benchmark this stuff, because the compiler is optimizing out a lot of my benchmarking code.

I've spent some time optimizing the code. I got rid of all the divides (from the inner loop, there are still some outside it), and reduced a lot of the redundant math. Based on my current estimates:

The current code (the one that I just optimized for the click/pop suppression) is using approximately 0.22% of the CPU. Unreleased optimized code is using approximately 0.16% of the CPU. Unreleased optimized code with conversion to a fixed-point inner loop (the one that gets executed once per 22.6us of real time), is utilizing a bit under 0.075% of the CPU.

The unreleased optimized floating point code is tested and ready to go and should be pretty safe. It's just multiplying by reciprocals instead of dividing, and pre-calculating and storing values which are commonly used. The fixed point code is not tested -- however, I don't anticipate it taking too long to get get working.

Is this worth pursuing?

kedder commented 3 years ago

I feel like it is not worth it, unless it provides some benefits (like performance or better code readability / maintainability). It doesn't look like anyone could notice the performance improvement. So unless it makes code easier to understand, I'd not touch it.

hpmax commented 3 years ago

My "unreleased optimized code" is a little harder to read, but I added documentation which I think should more than make up for it. The main issue is I added a few variables (for example pulse_risei in addition to pulse_rise, where pulse_risei = 1/pulse_rise).

The fixed point conversion is limited to the inner loop where it makes the most difference which means I have a few parallel integer variables. and add in some *32768, <<15, or >>15...

Fixed Point:

int deltaphase_int = (int) round(deltaphase*2<<15); int deltapulse_int = (int) round(deltapulse*2<<15); for (j<0;j<safeguard;++j) { pcm_buffer[j]=pulse_syn(phase_ptr,pulse_phase_ptr); phase_ptr+=deltaphase_int; if (phase_ptr>4*32768) phase_ptr-=4*32768; pulse_phase_ptr+=deltapulse_int; if (pulse_phase_ptr>2*m_pi*32768) pulse_phase_ptr-=2*m_pi*32768 }

Floating Point:

for (j<0;j<safeguard;++j) { pcm_buffer[j]=pulse_syn(phase_ptr,pulse_phase_ptr); phase_ptr+=deltaphase; if (phase_ptr>4) phase_ptr-=4; pulse_phase_ptr+=deltapulse; if (pulse_phase_ptr>2*m_pi) pulse_phase_ptr-=2*m_pi; }

I'd love to be able to say I optimized variod by a factor of 4 or 5 (which is probably the difference between where I started and the fixed point code), but realistically the differene between the optimized fixed point and floating point is only .1% of the CPU.