Closed q66 closed 3 years ago
btw, I also noticed that similarly, the check will not pick up clang (even though it should probably also work)
this is because all clang versions report __GNUC__
as 4 and __GNUC_MINOR__
as 2, so the test thinks it's gcc 4.2; perhaps adding __clang__
explicitly would also be good, though I haven't tested this configuration and it'd need modifying in more places than just this meson.build
, so it's out of scope for this PR
Yes, I'd leave the Clang support for another pull request, especially if it comes with some numbers as to whether Clang/LLVM support for vector types is on par with GCC's.
I believe we want the GCC vector stuff for every possible arch (ppc64, s390x, mips64, ...), so hard-coding a list is not a good solution.
64 bit PowerPC defines __powerpc64__
, I have tested this patch on s390x that defines __s390x__
, and there will be more ...
I believe we want the GCC vector stuff for every possible arch (ppc64, s390x, mips64, ...)
I believe that as well, in theory, except that GCC has had bugs in the past, and I don't have access to any of those architectures, so I can't put them under CI.
I'm tentatively :+1: on this pull request, but I'd like actual confirmation that somebody is actually testing this stuff, and can follow up with fixes in case something breaks.
If you can use Travis for CI, then it has ppc64le and s390x (and aarch64 too) available. I can (and likely will) add graphene to our private Fedora multi-arch CI or can provide to our public shared machines, if needed.
what's the actual reason gcc vectors can't work on 32-bit architectures, by the way? (though a lot of 32-bit CPUs don't have any kind of SIMD)
if things can work on those that do have SIMD, i don't see a reason to always disable it (if some are found to be broken, distributions/users can explicitly disable it via meson options, and a special case could be added later)
anyway, as far as testing goes: in our distro we have support for all ppc architectures (i.e. ppc64le, ppc64, ppc, ppcle) and i daily drive gnome on ppc64le with musl (since that's my primary workstation setup; i occasionally test the others on a variety of hardware as well), so testing is being done...
More testing. I sat down and built F34's 1.10.6 with Daniel's patch and swapped it in, and holy smokes what a difference on this Talos II. GNOME is actually usable again (I'm typing this comment on it). Animations are just about back to F33-levels of performance.
Here's my F34 build, if others want to try and confirm (ppc64le
). It should drop right into /lib64
. I threw in an -O3 -mcpu=power9
so don't run it on a POWER8.
libgraphene-1.0.so.0.1000.6.gz
another report from ppc64le user, it improved experience a lot.
1.10.6
Checking if "GCC vector intrinsics" compiles: YES
1/22 mutest / general OK 0.25s
2/22 mutest / hooks OK 0.25s
3/22 mutest / types OK 0.24s
4/22 graphene / box OK 0.23s 65 subtests passed
5/22 graphene / euler OK 0.23s 10 subtests passed
6/22 graphene / frustum OK 0.22s 21 subtests passed
7/22 graphene / matrix OK 0.22s 70 subtests passed
8/22 graphene / plane OK 0.21s 13 subtests passed
9/22 graphene / point OK 0.20s 24 subtests passed
10/22 graphene / point3d OK 0.19s 36 subtests passed
11/22 graphene / quad OK 0.19s 12 subtests passed
12/22 graphene / quaternion OK 0.18s 24 subtests passed
13/22 graphene / ray OK 0.17s 21 subtests passed
14/22 graphene / rect OK 0.17s 65 subtests passed
15/22 graphene / simd OK 0.16s 25 subtests passed
16/22 graphene / size OK 0.16s 17 subtests passed
17/22 graphene / sphere OK 0.15s 17 subtests passed
18/22 graphene / triangle OK 0.14s 56 subtests passed
19/22 graphene / vec2 OK 0.14s 37 subtests passed
20/22 graphene / vec3 OK 0.13s 51 subtests passed
21/22 graphene / vec4 OK 0.13s 63 subtests passed
22/22 graphene / introspection.py OK 0.14s 1 subtests passed
Ok: 22
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 0
Timeout: 0
Thanks for testing!
@ebassi I would like to ask if you could do another release so that upstream projects could pick up this improvement. Many thanks in advance
It looks like Fedora has this patch, but it's not in the build. I'm not sure the problem is on this end because pulling and rebuilding it fixed it. I've attached another -O3 -mcpu=power9
build here for those who want it; it's a drop in replacement. Not for POWER8.
(that's for F35)
strange, since graphene-1.10.6-2.fc35 the patches are in the build and applied
...
Program g-ir-scanner found: YES (/usr/bin/g-ir-scanner)
Checking if "SSE intrinsics" compiles: NO
Checking if "GCC vector intrinsics" compiles: YES
Checking if "ARM NEON intrinsics" compiles: NO
Configuring graphene-config.h using configuration
...
in https://kojipkgs.fedoraproject.org//packages/graphene/1.10.6/2.fc35/data/logs/ppc64le/build.log
I agree I can see the patch in there (I built from the same package) but compare yourself: there's a definite performance difference between the two libraries. I don't think my -O3 -mcpu=power9
yields that much improvement.
I believe it will be the flags :-) I will check with the toolchain guys.
fwiw, Void compiles with -O2
and power8 default and performance is fine
the previous behavior only ever enabled gcc vectors on x86_64
this results in a massive performance improvement in gnome 40 on ppc64le