ebassi / graphene

A thin layer of graphic data types
http://ebassi.github.io/graphene
Other
373 stars 80 forks source link

meson: fix gcc vector 64-bit check #233

Closed q66 closed 3 years ago

q66 commented 3 years ago

the previous behavior only ever enabled gcc vectors on x86_64

this results in a massive performance improvement in gnome 40 on ppc64le

q66 commented 3 years ago

btw, I also noticed that similarly, the check will not pick up clang (even though it should probably also work)

this is because all clang versions report __GNUC__ as 4 and __GNUC_MINOR__ as 2, so the test thinks it's gcc 4.2; perhaps adding __clang__ explicitly would also be good, though I haven't tested this configuration and it'd need modifying in more places than just this meson.build, so it's out of scope for this PR

ebassi commented 3 years ago

Yes, I'd leave the Clang support for another pull request, especially if it comes with some numbers as to whether Clang/LLVM support for vector types is on par with GCC's.

sharkcz commented 3 years ago

I believe we want the GCC vector stuff for every possible arch (ppc64, s390x, mips64, ...), so hard-coding a list is not a good solution. 64 bit PowerPC defines __powerpc64__, I have tested this patch on s390x that defines __s390x__, and there will be more ...

ebassi commented 3 years ago

I believe we want the GCC vector stuff for every possible arch (ppc64, s390x, mips64, ...)

I believe that as well, in theory, except that GCC has had bugs in the past, and I don't have access to any of those architectures, so I can't put them under CI.

I'm tentatively :+1: on this pull request, but I'd like actual confirmation that somebody is actually testing this stuff, and can follow up with fixes in case something breaks.

sharkcz commented 3 years ago

If you can use Travis for CI, then it has ppc64le and s390x (and aarch64 too) available. I can (and likely will) add graphene to our private Fedora multi-arch CI or can provide to our public shared machines, if needed.

q66 commented 3 years ago

what's the actual reason gcc vectors can't work on 32-bit architectures, by the way? (though a lot of 32-bit CPUs don't have any kind of SIMD)

if things can work on those that do have SIMD, i don't see a reason to always disable it (if some are found to be broken, distributions/users can explicitly disable it via meson options, and a special case could be added later)

q66 commented 3 years ago

anyway, as far as testing goes: in our distro we have support for all ppc architectures (i.e. ppc64le, ppc64, ppc, ppcle) and i daily drive gnome on ppc64le with musl (since that's my primary workstation setup; i occasionally test the others on a variety of hardware as well), so testing is being done...

classilla commented 3 years ago

More testing. I sat down and built F34's 1.10.6 with Daniel's patch and swapped it in, and holy smokes what a difference on this Talos II. GNOME is actually usable again (I'm typing this comment on it). Animations are just about back to F33-levels of performance.

classilla commented 3 years ago

Here's my F34 build, if others want to try and confirm (ppc64le). It should drop right into /lib64. I threw in an -O3 -mcpu=power9 so don't run it on a POWER8. libgraphene-1.0.so.0.1000.6.gz

gyakovlev commented 3 years ago

another report from ppc64le user, it improved experience a lot.

1.10.6

Checking if "GCC vector intrinsics" compiles: YES

 1/22 mutest / general            OK              0.25s
 2/22 mutest / hooks              OK              0.25s
 3/22 mutest / types              OK              0.24s
 4/22 graphene / box              OK              0.23s   65 subtests passed
 5/22 graphene / euler            OK              0.23s   10 subtests passed
 6/22 graphene / frustum          OK              0.22s   21 subtests passed
 7/22 graphene / matrix           OK              0.22s   70 subtests passed
 8/22 graphene / plane            OK              0.21s   13 subtests passed
 9/22 graphene / point            OK              0.20s   24 subtests passed
10/22 graphene / point3d          OK              0.19s   36 subtests passed
11/22 graphene / quad             OK              0.19s   12 subtests passed
12/22 graphene / quaternion       OK              0.18s   24 subtests passed
13/22 graphene / ray              OK              0.17s   21 subtests passed
14/22 graphene / rect             OK              0.17s   65 subtests passed
15/22 graphene / simd             OK              0.16s   25 subtests passed
16/22 graphene / size             OK              0.16s   17 subtests passed
17/22 graphene / sphere           OK              0.15s   17 subtests passed
18/22 graphene / triangle         OK              0.14s   56 subtests passed
19/22 graphene / vec2             OK              0.14s   37 subtests passed
20/22 graphene / vec3             OK              0.13s   51 subtests passed
21/22 graphene / vec4             OK              0.13s   63 subtests passed
22/22 graphene / introspection.py OK              0.14s   1 subtests passed

Ok:                 22  
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0 
ebassi commented 3 years ago

Thanks for testing!

runlevel5 commented 2 years ago

@ebassi I would like to ask if you could do another release so that upstream projects could pick up this improvement. Many thanks in advance

classilla commented 2 years ago

It looks like Fedora has this patch, but it's not in the build. I'm not sure the problem is on this end because pulling and rebuilding it fixed it. I've attached another -O3 -mcpu=power9 build here for those who want it; it's a drop in replacement. Not for POWER8.

libgraphene-1.0.so.0.1000.6.gz

classilla commented 2 years ago

(that's for F35)

sharkcz commented 2 years ago

strange, since graphene-1.10.6-2.fc35 the patches are in the build and applied

...
Program g-ir-scanner found: YES (/usr/bin/g-ir-scanner)
Checking if "SSE intrinsics" compiles: NO 
Checking if "GCC vector intrinsics" compiles: YES 
Checking if "ARM NEON intrinsics" compiles: NO 
Configuring graphene-config.h using configuration
...

in https://kojipkgs.fedoraproject.org//packages/graphene/1.10.6/2.fc35/data/logs/ppc64le/build.log

classilla commented 2 years ago

I agree I can see the patch in there (I built from the same package) but compare yourself: there's a definite performance difference between the two libraries. I don't think my -O3 -mcpu=power9 yields that much improvement.

sharkcz commented 2 years ago

I believe it will be the flags :-) I will check with the toolchain guys.

q66 commented 2 years ago

fwiw, Void compiles with -O2 and power8 default and performance is fine