RenderKit / rkcommon

Intel RenderKit common C++/CMake infrastructure
Apache License 2.0
17 stars 10 forks source link

Support not setting round-to-zero mode #9

Closed mathstuf closed 1 year ago

mathstuf commented 1 year ago

The forced rounding mode setting here:

rkcommon/tasking/detail/tasking_system_init.cpp:        _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

ends up causing problems if numpy is expected to be used in the same process. Loading it later on macOS (though it probably would on any Intel processor on any platform) trips this code which detects that the mode interferes with NumPy's expected semantics. When this is hit, numpy is not importable.

Can be reproduced with the ParaView binary in this CI job. (Right side bar > Browse > build > the .dmg > download)

Starting ParaView and then showing the Python Shell under the View menu and trying import numpy will show the error.

For a more specific way to show the problem:

$ cd ParaView.app/Contents
$ bin/pvpython
>>> import vtk
>>> vtk.vtkOSPRayPass() # start OSPRay
>>> import numpy

which shows the same problem (and not starting OSPRay lets it work).

Cc: @demarle

mathstuf commented 1 year ago

Ah, I see RKCOMMON_NO_SIMD now. I don't know how much of a perf impact that will have, but breaking import numpy seems like an awfully steep ask for ParaView at least.

It seems that OSPRay is calling initTaskingSystem(nthreads, true) from modules/cpu/ISPCDevice.cpp and modules/multiDevice/MultiDevice.cpp. However, both of these calls seem older than our previous versions of OSPRay (2.7.1 -> 2.12.0), so why this hasn't happened before is a mystery to me.

mathstuf commented 1 year ago

It seems that RKCOMMON_NO_SIMD is not effective when using TBB because TBB includes the intrinsic headers before the no-op macro has the chance to do its thing.

demarle commented 1 year ago

Thanks for the heads up Ben. We are looking on this side too now...

miroslawpawlowski commented 1 year ago

The issue will be fixed in OSPRay 3 that would no longer set FTZ & DAZ flags explicitly via rkcommon but will rely on ISPC in this matter. This way only worker threads would modify FTZ & DAZ flags just for the duration of ISPC kernels.