ericwa / ericw-tools

Quake/Hexen 2 Map compiling tools - branch of http://disenchant.net/utils
http://ericwa.github.io/ericw-tools
GNU General Public License v2.0
345 stars 57 forks source link

Broken embree performance on Ubuntu 22.10 #352

Closed dsvensson closed 1 year ago

dsvensson commented 1 year ago

I noticed that compiling embree (and this project) under the Intel oneAPI container (dpcpp among others) produces a significantly faster light execution compared to using the Ubuntu package. For the sample map I have a light compile time of about 725 seconds, and with the container compiled version I land on 235 seconds. The same map on Windows runs light on 320 seconds, so some room for improvements there as well.

I noticed that the Windows version of embree shipped with this project is older (3.12.1) than the Ubuntu packaged one (3.13.4). The version I compiled was the same as the Ubuntu version to be able to compare. I also tried oneapi container compiled embree4 with the minimal API changes and that was equivalent to 3.13.4 built under same conditions.

Thinking it would be nice mentioning in the README this humongous slowdown of using the Linux distribution built version.

By checking the symbols in the packaged version I do see AVX etc in there, so some runtime detection seems be available, maybe it doesn't work, or maybe the difference is due to something else.

From what I could tell, the resulting maps were identical betweeen all different versions.

ericwa commented 1 year ago

Yeah, I ran into this before - the Ubuntu packages of embree are way slower than Intel's builds. Not sure why; I assume Embree is optimized for the Intel compiler or vice versa. I can add a note to the readme that Ubuntu packages of embree in particular are confirmed to be slow and should be avoided if you want good performance.

I think my Linux x86_64 builds have always bundled an embree .so downloaded from the Embree github, so anyone using the official releases or CI builds of ericw-tools should get good performance.

ericwa commented 1 year ago

and with the container compiled version I land on 235 seconds. The same map on Windows runs light on 320 seconds, so some room for improvements there as well.

I did a quick check on Windows of Embree 3.13.5 vs 3.12.2 (both .dll's from Embree's github releases page):

Didn't do multiple runs, but it seems about the same. So I'm guessing the 235s vs. 320s difference you saw was due to other factors than the embree version - maybe the compiler for the ericw-tools code (MSVC vs Intel on the oneAPI container?), maybe the OS having some impact (?).

I would just update the Embree DLL on Windows anyway, but it's slightly annoying because there's a diamond dependency with TBB (both ericw-tools and Embree depend on TBB) and I have to hunt down a compatible TBB version.

dsvensson commented 5 months ago

Oh, realize I didn't follow up on your comments here. Yes, that's my conclusion as well.

$ strings embree4.dll | grep intel
CLANG 18.0.0 (https://github.com/intel/llvm.git de1b485d5e08fe82479919a77ca74fedcbf6e1aa)
$ strings libembree4.so.4 | grep intel
CLANG 18.0.0 (https://github.com/intel/llvm.git de1b485d5e08fe82479919a77ca74fedcbf6e1aa)
$ strings libembree4.4.dylib | grep clang # No DPC++ on macOS due to Apple sacrificing OpenCL for Metal.
CLANG 14.0.0 (clang-1400.0.29.202)