RobRich999 / Chromium_Clang

Chromium browser compiled with the Clang/LLVM compiler.
157 stars 10 forks source link

No more AVX2 builds? #23

Closed lospejos closed 2 years ago

lospejos commented 2 years ago

All Chromium builds up to 95.0.4629.0 was built with AVX2 support. After that build I found no AVX2-aware builds. Did you stopped releasing builds with AVX2 support?

RobRich999 commented 2 years ago

AVX2 is discontinued... for now? There is practically little to no performance difference compared to AVX builds, yet AVX2 builds are often my most problematic builds to maintain.

Some components use CPU dispatching for AVX, AVX2, FMA, etc. support anyway, so there is no real difference for those.

https://johnysswlab.com/cpu-dispatching-make-your-code-both-portable-and-fast/

Technically, the predominate reason I do AVX builds at all is to avoid transition penalties when dealing those CPU dispatching components. Setting AVX as a baseline gets us VEX encoding across the board. :)

https://john-h-k.github.io/VexTransitionPenalties.html

Paukan777 commented 2 years ago

Technically, the predominate reason I do AVX builds at all is to avoid transition penalties when dealing those CPU dispatching components. Setting AVX as a baseline gets us VEX encoding across the board. :)

https://john-h-k.github.io/VexTransitionPenalties.html

AFAIK, AVX2 uses same opcode encoding as AVX (as superset of AVX), so using AVX2 will result in no much penalties than using AVX. Am I missed something?

RobRich999 commented 2 years ago

The default Chromium baseline is SSE3. SSE3 predates VEX encoding. Any time there is a cpu dispatch with AVX/2/512 code there can be (and often are) transition penalties when going back and fourth between the non-VEX encoded SSEx instructions and dispatched VEX encoded AVX/2/512 instructions. The penalties get into what to do with the upper 128 bits of the SIMD registers. Preservation and restoration of those bits can burn clocks and create unnecessary stalls, wait states, etc.

Bumping the baseline to AVX enables VEX encoding, including for SSEx instructions, So even if LLVM still generates SSEx instructions for whatever reasons, those SSEx instructions will be VEX encoded and (hopefully) not subject to (as many) transition penalties when run along side AVX/2/512 code. ;)


It is actually more nuanced than the above, especially for specific instructions and on some of the later architectures, but the general idea is there. Long story short, and YMMV of course, it is typically "better" to simply not mix non-VEX and VEX SIMD code on x86 procs to hopefully avoid or at least help limit transition penalties. Thus a large part of why I do AVX builds.