RenderKit / ospray

An Open, Scalable, Portable, Ray Tracing Based Rendering Engine for High-Fidelity Visualization
http://ospray.org
Apache License 2.0
1.02k stars 186 forks source link

BVH construction does not scale with the thread count (Embree 2.17.2, TBB 4.4) #228

Closed acdemiralp closed 5 years ago

acdemiralp commented 6 years ago

Hello,

We are using Ospray v1.4.3, built with Embree 2.17.2 and TBB 4.4.

We have conducted several scaling tests, and while rendering seems to scale with the number of cores correctly, the BVH construction after a geometry->commit() seems to be constant.

Here are the numbers:

johguenther commented 6 years ago

Hi, can you please share more details that we can better understand the problem?

Thanks, Johannes

acdemiralp commented 6 years ago

Hello, I am inspecting this further and will update with the requested details as soon as possible.

acdemiralp commented 6 years ago

Hello,

Here is a minimum working example: https://devhub.vr.rwth-aachen.de/ademiralp/ospray_bvh_scaling/

Here are some values extracted from the example: Cores, Run 1, Run 2

24, 6546.6089220000003, 6411.2161690000003

16, 5651.1990880000003, 6080.1993320000001

8, 5792.3470960000004, 5932.2348949999996

4, 5651.9017709999998, 6349.7026329999999

2, 5510.6764199999998, 5965.442806

My theory:

  const auto device = ospGetCurrentDevice();
  ospDeviceSet1i     (device, "numThreads", int(cores));
  ospDeviceCommit    (device);
  ospSetCurrentDevice(device);

adjusts Ospray's #cores but does not adjust underlying Embree's #cores.

ingowald commented 6 years ago

Just had a brief look. Are you guys sure that it's related to the embree bvh build performance?

Reason I'm asking is that in StreamLines::finalize() (which also gets called upon commit(), and before emrbee's BVH build) there's some serial code; in paritcular, if you happen to have a "radius" array it turns on the 'smooth' curves, and then enters quite some precomputations, all of which are compeltely scalar, single-threaded code.

What I'd suggest is put some timing code (or at least, a printf()) into various places of StreamLiens::finalize(), and make sure you're not actually getting stuck in this function.

acdemiralp commented 6 years ago

I added "--osp:debug", "--osp:vv", "--osp:logoutput", "cout" to see whether smooth curves were enabled, and they seem disabled. I will add logging within the finalize tomorrow since I currently do not have the source build on this computer. But from reading code, its either the base class call:

Geometry::finalize(model);

or

// XXX curves may actually have a larger bounding box due to swinging
    for (uint32_t i = 0; i < numSegments; i++) {
      const uint32 idx = index[i];
      bounds.extend(vertex[idx] - radius[idx]);
      bounds.extend(vertex[idx] + radius[idx]);
      bounds.extend(vertex[idx+1] - radius[idx+1]);
      bounds.extend(vertex[idx+1] + radius[idx+1]);
    }

or

ispc::StreamLines_set(getIE(),model->getIE(), globalRadius,
          (const ispc::vec3fa*)vertex, numVertices, index, numSegments, color);

Log for "curve: 0" (this is recorded on a much weaker computer and smaller data than the original example):

Generating streamlines.
Running Ospray.

Embree Ray Tracing Kernels 3.1.0 (b1bdaa246c4d52a517a04d022b801902c555de03)
  Compiler  : Intel Compiler 17.0.1
  Build     : Release
  Platform  : Windows (64bit)
  CPU       : Haswell (GenuineIntel)
   Threads  : 4
   ISA      : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
   Targets  : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2
   MXCSR    : FTZ=1, DAZ=1
  Config
    Threads : 1
    ISA     : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
    Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2  (supported)
              SSE2 SSE4.2 AVX AVX2 AVX512SKX  (compile time enabled)
    Features: intersection_filter
    Tasking : TBB2017.0 TBB_header_interface_9100 TBB_lib_interface_10001

general:
  build threads = 1
  start_threads = 0
  affinity      = 0
  hugepages     = disabled
  verbosity     = 2
  cache_size    = 134.218 MB
  max_spatial_split_replications = 2
triangles:
  accel         = default
  builder       = default
  traverser     = default
motion blur triangles:
  accel         = default
  builder       = default
  traverser     = default
quads:
  accel         = default
  builder       = default
  traverser     = default
motion blur quads:
  accel         = default
  builder       = default
  traverser     = default
line segments:
  accel         = default
  builder       = default
  traverser     = default
motion blur line segments:
  accel         = default
  builder       = default
  traverser     = default
hair:
  accel         = default
  builder       = default
  traverser     = default
motion blur hair:
  accel         = default
  builder       = default
  traverser     = default
subdivision surfaces:
  accel         = default
grids:
  accel         = default
  builder       = default
motion blur grids:
  accel         = default
  builder       = default
object_accel:
  min_leaf_size = 1
  max_leaf_size = 1
object_accel_mb:
  min_leaf_size = 1
  max_leaf_size = 1
#ospray: trying to look up renderer type 'scivis' for the first time
#ospray: trying to look up geometry type 'streamlines' for the first time
=======================================================
Finalizing model, has 1 geometries and 0 volumes
=======================================================
Finalizing geometry 0
#osp: creating streamlines geometry, #verts=1441792, #segments=1310720, as curve: 0
               segments:          0
-----------------------------------
              triangles:          0
                  quads:          0
                subdivs:          0
               usergeom:    1310720
      flat_linear_curve:          0
     round_linear_curve:          0
  oriented_linear_curve:          0
      flat_bezier_curve:          0
     round_bezier_curve:          0
  oriented_bezier_curve:          0
     flat_bspline_curve:          0
    round_bspline_curve:          0
 oriented_bspline_curve:          0
               instance:          0
                   grid:          0
building BVH4<object> using avx::BVH4BuilderSAH ...
finished BVH4<object> : 5594.3ms, 0.234295 Mprim/s, 0.0162027 GB/s
  primitives = 1310720, vertices = 0, depth = 11
  total            : sah =  43.314 (100.00%), #bytes =   90.36 MB (100.00%), #nodes = 1934762 ( 85.25% filled), #bytes/prim =  68.94
  alignedNodes     : sah =  41.505 ( 95.82%), #bytes =   79.88 MB ( 88.40%), #nodes =  624042 ( 77.51% filled), #bytes/prim =  60.94
  leaves           : sah =   1.809 (  4.18%), #bytes =   10.49 MB ( 11.60%), #nodes = 1310720 (100.00% filled), #bytes/prim =   8.00
    histogram      : 100.00%   0.00%   0.00%   0.00%   0.00%   0.00%   0.00%   0.00%
  alloc : used =  90.643 MB,                                                             #bytes/prim =  69.15
  alloc : used =  90.643 MB, free =   0.004 MB, wasted =  11.739 MB, total = 102.385 MB, #bytes/prim =  78.11
  total : used = 102.385 MB, free =   1.247 MB, wasted =   0.009 MB, total = 103.641 MB, #bytes/prim =  79.07
  4K    : used =   0.000 MB, free =   0.000 MB, wasted =   0.000 MB, total =   0.000 MB, #bytes/prim =   0.00
  2M    : used =   0.000 MB, free =   0.000 MB, wasted =   0.000 MB, total =   0.000 MB, #bytes/prim =   0.00
  malloc: used = 102.385 MB, free =   1.247 MB, wasted =   0.009 MB, total = 103.641 MB, #bytes/prim =  79.07
  shared: used =   0.000 MB, free =   0.000 MB, wasted =   0.000 MB, total =   0.000 MB, #bytes/prim =   0.00
created scene intersector
  accels[0]
    intersector1  = avx2::BVH4VirtualIntersector1
    intersector4  = avx2::BVH4VirtualIntersector4Chunk
    intersector8  = avx2::BVH4VirtualIntersector8Chunk
    intersectorN = avx2::BVH4VirtualIntersectorStream
selected scene intersector
  intersector1  = avx2::BVH4VirtualIntersector1
  intersector4  = avx2::BVH4VirtualIntersector4Chunk
  intersector8  = avx2::BVH4VirtualIntersector8Chunk
  intersectorN = avx2::BVH4VirtualIntersectorStream
ingowald commented 6 years ago

Versuche dich grad zu errrichen :-)

Sent from my iPhone

On Jun 5, 2018, at 3:44 PM, Ali Can Demiralp notifications@github.com wrote:

I enabled logging to see whether smooth curves were enabled, and they seem disabled. I will add logging tomorrow since I currently do not have the source build on this computer. But from reading code, its either :

Geometry::finalize(model); or

// XXX curves may actually have a larger bounding box due to swinging for (uint32_t i = 0; i < numSegments; i++) { const uint32 idx = index[i]; bounds.extend(vertex[idx] - radius[idx]); bounds.extend(vertex[idx] + radius[idx]); bounds.extend(vertex[idx+1] - radius[idx+1]); bounds.extend(vertex[idx+1] + radius[idx+1]); } or

ispc::StreamLines_set(getIE(),model->getIE(), globalRadius, (const ispc::vec3fa*)vertex, numVertices, index, numSegments, color); Log for "curve: 0" (this is recorded on a much weaker computer and smaller data than the original example):

Generating streamlines. Running Ospray.

Embree Ray Tracing Kernels 3.1.0 (b1bdaa246c4d52a517a04d022b801902c555de03) Compiler : Intel Compiler 17.0.1 Build : Release Platform : Windows (64bit) CPU : Haswell (GenuineIntel) Threads : 4 ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 MXCSR : FTZ=1, DAZ=1 Config Threads : 1 ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 (supported) SSE2 SSE4.2 AVX AVX2 AVX512SKX (compile time enabled) Features: intersection_filter Tasking : TBB2017.0 TBB_header_interface_9100 TBB_lib_interface_10001

general: build threads = 1 start_threads = 0 affinity = 0 hugepages = disabled verbosity = 2 cache_size = 134.218 MB max_spatial_split_replications = 2 triangles: accel = default builder = default traverser = default motion blur triangles: accel = default builder = default traverser = default quads: accel = default builder = default traverser = default motion blur quads: accel = default builder = default traverser = default line segments: accel = default builder = default traverser = default motion blur line segments: accel = default builder = default traverser = default hair: accel = default builder = default traverser = default motion blur hair: accel = default builder = default traverser = default subdivision surfaces: accel = default grids: accel = default builder = default motion blur grids: accel = default builder = default object_accel: min_leaf_size = 1 max_leaf_size = 1 object_accel_mb: min_leaf_size = 1 max_leaf_size = 1

ospray: trying to look up renderer type 'scivis' for the first time

ospray: trying to look up geometry type 'streamlines' for the first time

======================================================= Finalizing model, has 1 geometries and 0 volumes

Finalizing geometry 0

osp: creating streamlines geometry, #verts=1441792, #segments=1310720, as curve: 0

           segments:          0

          triangles:          0
              quads:          0
            subdivs:          0
           usergeom:    1310720
  flat_linear_curve:          0
 round_linear_curve:          0

oriented_linear_curve: 0 flat_bezier_curve: 0 round_bezier_curve: 0 oriented_bezier_curve: 0 flat_bspline_curve: 0 round_bspline_curve: 0 oriented_bspline_curve: 0 instance: 0 grid: 0 building BVH4 using avx::BVH4BuilderSAH ... finished BVH4 : 5594.3ms, 0.234295 Mprim/s, 0.0162027 GB/s primitives = 1310720, vertices = 0, depth = 11 total : sah = 43.314 (100.00%), #bytes = 90.36 MB (100.00%), #nodes = 1934762 ( 85.25% filled), #bytes/prim = 68.94 alignedNodes : sah = 41.505 ( 95.82%), #bytes = 79.88 MB ( 88.40%), #nodes = 624042 ( 77.51% filled), #bytes/prim = 60.94 leaves : sah = 1.809 ( 4.18%), #bytes = 10.49 MB ( 11.60%), #nodes = 1310720 (100.00% filled), #bytes/prim = 8.00 histogram : 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% alloc : used = 90.643 MB, #bytes/prim = 69.15 alloc : used = 90.643 MB, free = 0.004 MB, wasted = 11.739 MB, total = 102.385 MB, #bytes/prim = 78.11 total : used = 102.385 MB, free = 1.247 MB, wasted = 0.009 MB, total = 103.641 MB, #bytes/prim = 79.07 4K : used = 0.000 MB, free = 0.000 MB, wasted = 0.000 MB, total = 0.000 MB, #bytes/prim = 0.00 2M : used = 0.000 MB, free = 0.000 MB, wasted = 0.000 MB, total = 0.000 MB, #bytes/prim = 0.00 malloc: used = 102.385 MB, free = 1.247 MB, wasted = 0.009 MB, total = 103.641 MB, #bytes/prim = 79.07 shared: used = 0.000 MB, free = 0.000 MB, wasted = 0.000 MB, total = 0.000 MB, #bytes/prim = 0.00 created scene intersector accels[0] intersector1 = avx2::BVH4VirtualIntersector1 intersector4 = avx2::BVH4VirtualIntersector4Chunk intersector8 = avx2::BVH4VirtualIntersector8Chunk intersectorN = avx2::BVH4VirtualIntersectorStream selected scene intersector intersector1 = avx2::BVH4VirtualIntersector1 intersector4 = avx2::BVH4VirtualIntersector4Chunk intersector8 = avx2::BVH4VirtualIntersector8Chunk intersectorN = avx2::BVH4VirtualIntersectorStream — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.