RenderKit / embree

Embree ray tracing kernels repository.
Apache License 2.0
2.32k stars 383 forks source link

Arm64 MSVC support #448

Closed anthony-linaro closed 11 months ago

anthony-linaro commented 1 year ago

Summary

This PR adds support for building embree for Windows ARM64 platforms, using MSVC as the compiler.

Reasoning

Blender :)

Test/example status

This details the current status of tests and their results compared to emulated x64.

All tests were performed on a Lenovo Thinkpad X13s running the latest version of Windows 11.

Improvements

Test name Native FPS Native Mray/s Emulated FPS Emulated Mray/s
embree_closest_point 84 n/a 65 n/a
embree_collide 28 n/a 16 n/a
embree_curve_geometry 47 21.3 35 16.2
embree_displacement_geometry 67 29.6 52 22.7
embree_hair_geometry 3.1 3.4 2.3 2.5
embree_instanced_geometry 96 40 70 30
embree_interpolation 61 24.1 45 17.8
embree_intersection_filter 61 56.2 29 26.7
embree_lazy_geometry 105 47.6 78 34.9
embree_motion_blur_geometry 44 18.1 28 11.4
embree_next_hit 48 30.6 34 20.3
embree_pathtracer 11.7 25.8 8.2 18.1
embree_point_geometry 68 28.2 47 19.3
embree_quaternion_motion_blur 15 32.2 7.2 15.1
embree_ray_mask 179 72.9 144 58.1
embree_triangle_geometry 165 74.2 163 60.3
embree_user_geometry 70 28.4 45 19.7

Questionable renders

Test name Native FPS Native Mray/s Emulated FPS Emulated Mray/s Notes
embree_grid_geometry 58 25.5 45 19.7 image
embree_subdivision_geometry 29 12.9 58 25.4 image

Regressions

Test name Native FPS Native Mray/s Emulated FPS Emulated Mray/s
embree_dynamic_scene 0.5 0.26 15 6.7
embree_multiscene_geometry (right half only) 0.8 0.34 3.4 1.47

Test Failures

As can be seen above, there are still some rough edges to this PR.

One other thing of note is that some tests are failing (notably, all but one watertight test) as seen below:

Embree Ray Tracing Kernels 4.1.0 (047b2701fa574f84233f9e6aa90ba726289d1718)
  Compiler  : Visual C++ Compiler 19.35.3221.6
  Build     : Release
  Platform  : Windows (64bit)
  CPU       : ARM (ARM)
   Threads  : 8
   ISA      : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 NEON 2xNEON
   Targets  : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 NEON 2xNEON
   MXCSR    : FTZ=0, DAZ=0
  Config
    Threads : default
    ISA     : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 NEON 2xNEON
    Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 NEON 2xNEON  (supported)
              SSE2  (compile time enabled)
    Features: raymasks intersection_filter
    Tasking : TBB2020.3 TBB_header_interface_11103 TBB_lib_interface_11103

================================================================================
  WARNING: "Flush to Zero" or "Denormals are Zero" mode not enabled
           in the MXCSR control and status register. This can have a severe
           performance impact. Please enable these modes for each application
           thread the following way:

           #include "xmmintrin.h"
           #include "pmmintrin.h"

           _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
           _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
================================================================================

 [PASSED]
                                                       fast_allocator_regression_test ... [PASSED]
                                                         motion_derivative_regression ... [PASSED]
                                                            collision_regression_test ... [PASSED]
                                                                cache_regression_test ... [PASSED]
                                                          barrier_sys_regression_test ... [PASSED]
                                                                SSE2.multiple_devices ... [PASSED]
                                                                      SSE2.types_test ... [PASSED]
                                                                      SSE2.get_bounds ...++++++ [PASSED]
                                                               SSE2.get_linear_bounds ...++++++ [PASSED]
                                                                   SSE2.get_user_data ... [PASSED]
                                                                   SSE2.buffer_stride ...++++++++ [PASSED]
                                                                     SSE2.empty_scene ...++++++++++ [PASSED]
                                                                  SSE2.empty_geometry ...++++++++++ [PASSED]
                                                                           SSE2.build ...++++++++++ [PASSED]
                                                          SSE2.overlapping_primitives ...++++++++++ [PASSED]
                                                             SSE2.new_delete_geometry ..............................................................................................................................+.+++.+ [PASSED]
                                                                SSE2.user_geometry_id ...+++++ [PASSED]
                                                         SSE2.enable_disable_geometry ...+++++ [PASSED]
                                                         SSE2.disable_detach_geometry .................+...+..+..+....+ [PASSED]
                                                                          SSE2.update ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                              SSE2.build_garbage_geom ..................................................... [PASSED]
                                                           SSE2.interpolate.triangles ...++++++ [PASSED]
                                                                SSE2.interpolate.grid ...++++++ [PASSED]
                                                              SSE2.interpolate.subdiv ...++++++ [PASSED]
                                                                SSE2.interpolate.hair ...++++++ [PASSED]
                                                                    SSE2.triangle_hit ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                        SSE2.quad_hit ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                       SSE2.ray_masks ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                             SSE2.intersection_filter ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                      SSE2.instancing ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                   SSE2.inactive_rays ...++++++-++++++++++++++++++++++++++++++++++-+++++-+++-+++++++++++++++++++++++-++-+++++-+++++ [FAILED]
                                                            SSE2.watertight_triangles ...-------------------------------- [FAILED]
                                                         SSE2.watertight_triangles_mb ...-------------------------------- [FAILED]
                                                                SSE2.watertight_quads ...-------------------------------- [FAILED]
                                                             SSE2.watertight_quads_mb ...-------------------------------- [FAILED]
                                                                SSE2.watertight_grids ...-------------------------------- [FAILED]
                                                             SSE2.watertight_grids_mb ...-------------------------------- [FAILED]
                                                               SSE2.watertight_subdiv ...!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [PASSED]
                                                              SSE2.ray_alignment_test ...++++++++++++++++++++++++++++-+-+-+-+-+--++-++---+++-+---+++-++++ [FAILED]
                                                                     SSE2.point_query ...+++++++++++++++++++++++ [PASSED]
                                                               SSE2.regression_static .......... [FAILED]
                                                              SSE2.regression_dynamic ................................. [PASSED]
                                                    SSE2.regression_static_build_join ...............

The entire executable then fails with exit code -1073740791. This does not happen on Emulated x64, and all tests pass.

Any help or pointers in the right direction would be much appreciated 😄

afadia-quic commented 1 year ago

Hi,

I am trying to run the same tests for v3.13.5 on a Windows on ARM device, and I am unable to get them to pass. I am hitting some failed assertions, such as: Assertion failed: m_trav_active, file C:\embree_3-13-5_src\kernels\bvh/bvh_traverser_stream.h, line 93

I am using LLVM-Clang to compile, the version that comes with Visual Studio (Currently, clang-cl.exe version 15.0.1)

Any tips on how to debug and fix this issue are appreciated!

Thanks in Advance.

anthony-linaro commented 11 months ago

Closing, as I will just use clang instead