Closed Quuxplusone closed 7 years ago
Attached vmlaq_f32_testcase.cc
(123 bytes, text/x-c++src): testcase
Note: this bug exists at least in clang versions 3.5 and 3.8.
Kristof, I know that this particular issue has been discussed before, but perhaps there is a reason to revisit the original decision. The extra register usage does seem particularly troubling.
Hi,
It's not a (it's not a feature it's a bug), it's a feature! :)
Seriously though, Clang has no concept of a blended code generation for all cores implementing an architecture variant. Clang compiles for some cpu and that's it.
If you don't set -mcpu, it will be set under your feet. For -march=armv7a, the default is indeed -mcpu=cortex-a8 which does enable this performance erratum workaround.
When this was brought up before, we suggested using a more sane -mcpu argument in Android, such as -mcpu=cortex-a15 or -mcpu=cortex-a57 for -march=armv8a. Is that not an option still?
The alternative is that we implement some blended mode and introduce a fake CPU target for it (like "generic" or something).
James
I think Benoit has already worked around this issue by setting a more appropriate cpu target. I think the concern is that generic users of clang for the NDK might not be setting mcpu at all, and that will lead them to low performance.
Hi Steve,
In fact, it looks like we implemented this for clang-3.8:
$ clang-3.8 test.c -o - -S -mfloat-abi=hard -march=armv7a -O3
...
vmul.f32 q8, q0, q0
vadd.f32 q0, q8, q0
bx lr
...
$ clang-3.8 test.c -o - -S -mfloat-abi=hard -march=armv7a -O3 -mcpu=generic
...
vmla.f32 q0, q0, q0
bx lr
...
So cortex-a8 is still the default, but -mcpu=generic will get you an
architectural target with no core-specific workarounds.
Changing the default from cortex-a8 is probably a no-go given how long it's
been that way, I think.
James
Coming back to this, I'm going to weigh in that I also think that this might not be the expected behavior. While some specific cortex-a8 cpus need the workaround not all of them do and so the workaround should only be enabled when targeting those specific cpus and not the generic core.
James: Thoughts?
By this point I'd be perfectly happy to kill -mcpu=cortex-a8 being the default, but I'm really worried about the inevitable code permutation users will notice, and that may even cause regressions.
Unless someone wants to take on a lot of bugzilla traiging pain, I think we really need to consider this hard-baked by now. What do you think?
As an aside, I wasn't aware that there were A8's that weren't affected by this bug? But a8 was somewhat before my time.
Honestly I think -mcpu=generic is probably the right way to go here if nothing is set for the cpu. While it seems like it's been baked for a long time in the grand scheme of things it might not be so bad to change it.
Fixed in r304390, by making -mcpu=generic the default.
In r306514, -mcpu=generic was made to schedule instructions in the same way as
when targeting Cortex-A8, to overcome a small performance loss observed from
making -mcpu=generic the default.
More details are available at http://lists.llvm.org/pipermail/llvm-dev/2017-
May/113525.html
Many thanks for the good analysis and resolution.
vmlaq_f32_testcase.cc
(123 bytes, text/x-c++src)