Open GoogleCodeExporter opened 8 years ago
Erratum: Function body for the original:
static inline float sin_original(float x) {
x = (1.27323954473516268615f - .40528473456935108577f*fabsf(x))*x;
return x*(0.225f*fabsf(x) + 0.775f);
}
(bad copy/paste, sorry)
Original comment by julien.c...@gmail.com
on 7 Apr 2009 at 2:34
The actual fast_sin() routine was written to test the vector implementation,
vsinf
against. No attempt at optimisation was made (probably should have called it
something other than fast_sin). Note that fast_sin() also performs range
reduction,
which your code omits. Looks to me like abs() is resulting in fabss
instructions, and
that round() is calling out into library code, which is probably the worst
performance offender.
You might also want to try -ftree-vectorize in your CFLAGS.
Original comment by damien.m...@gmail.com
on 7 Apr 2009 at 3:50
I tried to use -ftree-vectorize, but it does nothing since VFP is currently not
supported by the tree vectorizer (only Neon is, AFAIK) : SIMD_UNITS_PER_WORD is
not
defined on this target.
Original comment by julien.c...@gmail.com
on 7 Apr 2009 at 4:03
My understanding is that that SIMD_UNITS_PER_WORD message is a warning, and
results
in a vector size of 1 being chosen. I have seen performance improvements
turning it
on, but on larger bodies of code, and I have spoken with others who swear by
it. I
cant say I have examined the asm output closely enough to detect what the
difference is.
Original comment by damien.m...@gmail.com
on 7 Apr 2009 at 7:28
What difference is there between a scalar and a vector whose size is 1?
Turning it on/off produces the exact same assembly, in my tests.
Original comment by julien.c...@gmail.com
on 7 Apr 2009 at 8:57
Original issue reported on code.google.com by
julien.c...@gmail.com
on 7 Apr 2009 at 2:32