Closed acdemiralp closed 5 years ago
Hi, can you please share more details that we can better understand the problem?
verbose=2
option placed in a .embree2
file)?Thanks, Johannes
Hello, I am inspecting this further and will update with the requested details as soon as possible.
Hello,
Here is a minimum working example: https://devhub.vr.rwth-aachen.de/ademiralp/ospray_bvh_scaling/
Here are some values extracted from the example: Cores, Run 1, Run 2
My theory:
const auto device = ospGetCurrentDevice();
ospDeviceSet1i (device, "numThreads", int(cores));
ospDeviceCommit (device);
ospSetCurrentDevice(device);
adjusts Ospray's #cores but does not adjust underlying Embree's #cores.
Just had a brief look. Are you guys sure that it's related to the embree bvh build performance?
Reason I'm asking is that in StreamLines::finalize() (which also gets called upon commit(), and before emrbee's BVH build) there's some serial code; in paritcular, if you happen to have a "radius" array it turns on the 'smooth' curves, and then enters quite some precomputations, all of which are compeltely scalar, single-threaded code.
What I'd suggest is put some timing code (or at least, a printf()) into various places of StreamLiens::finalize(), and make sure you're not actually getting stuck in this function.
I added "--osp:debug", "--osp:vv", "--osp:logoutput", "cout"
to see whether smooth curves were enabled, and they seem disabled. I will add logging within the finalize tomorrow since I currently do not have the source build on this computer. But from reading code, its either the base class call:
Geometry::finalize(model);
or
// XXX curves may actually have a larger bounding box due to swinging
for (uint32_t i = 0; i < numSegments; i++) {
const uint32 idx = index[i];
bounds.extend(vertex[idx] - radius[idx]);
bounds.extend(vertex[idx] + radius[idx]);
bounds.extend(vertex[idx+1] - radius[idx+1]);
bounds.extend(vertex[idx+1] + radius[idx+1]);
}
or
ispc::StreamLines_set(getIE(),model->getIE(), globalRadius,
(const ispc::vec3fa*)vertex, numVertices, index, numSegments, color);
Log for "curve: 0" (this is recorded on a much weaker computer and smaller data than the original example):
Generating streamlines.
Running Ospray.
Embree Ray Tracing Kernels 3.1.0 (b1bdaa246c4d52a517a04d022b801902c555de03)
Compiler : Intel Compiler 17.0.1
Build : Release
Platform : Windows (64bit)
CPU : Haswell (GenuineIntel)
Threads : 4
ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2
MXCSR : FTZ=1, DAZ=1
Config
Threads : 1
ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 (supported)
SSE2 SSE4.2 AVX AVX2 AVX512SKX (compile time enabled)
Features: intersection_filter
Tasking : TBB2017.0 TBB_header_interface_9100 TBB_lib_interface_10001
general:
build threads = 1
start_threads = 0
affinity = 0
hugepages = disabled
verbosity = 2
cache_size = 134.218 MB
max_spatial_split_replications = 2
triangles:
accel = default
builder = default
traverser = default
motion blur triangles:
accel = default
builder = default
traverser = default
quads:
accel = default
builder = default
traverser = default
motion blur quads:
accel = default
builder = default
traverser = default
line segments:
accel = default
builder = default
traverser = default
motion blur line segments:
accel = default
builder = default
traverser = default
hair:
accel = default
builder = default
traverser = default
motion blur hair:
accel = default
builder = default
traverser = default
subdivision surfaces:
accel = default
grids:
accel = default
builder = default
motion blur grids:
accel = default
builder = default
object_accel:
min_leaf_size = 1
max_leaf_size = 1
object_accel_mb:
min_leaf_size = 1
max_leaf_size = 1
#ospray: trying to look up renderer type 'scivis' for the first time
#ospray: trying to look up geometry type 'streamlines' for the first time
=======================================================
Finalizing model, has 1 geometries and 0 volumes
=======================================================
Finalizing geometry 0
#osp: creating streamlines geometry, #verts=1441792, #segments=1310720, as curve: 0
segments: 0
-----------------------------------
triangles: 0
quads: 0
subdivs: 0
usergeom: 1310720
flat_linear_curve: 0
round_linear_curve: 0
oriented_linear_curve: 0
flat_bezier_curve: 0
round_bezier_curve: 0
oriented_bezier_curve: 0
flat_bspline_curve: 0
round_bspline_curve: 0
oriented_bspline_curve: 0
instance: 0
grid: 0
building BVH4<object> using avx::BVH4BuilderSAH ...
finished BVH4<object> : 5594.3ms, 0.234295 Mprim/s, 0.0162027 GB/s
primitives = 1310720, vertices = 0, depth = 11
total : sah = 43.314 (100.00%), #bytes = 90.36 MB (100.00%), #nodes = 1934762 ( 85.25% filled), #bytes/prim = 68.94
alignedNodes : sah = 41.505 ( 95.82%), #bytes = 79.88 MB ( 88.40%), #nodes = 624042 ( 77.51% filled), #bytes/prim = 60.94
leaves : sah = 1.809 ( 4.18%), #bytes = 10.49 MB ( 11.60%), #nodes = 1310720 (100.00% filled), #bytes/prim = 8.00
histogram : 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
alloc : used = 90.643 MB, #bytes/prim = 69.15
alloc : used = 90.643 MB, free = 0.004 MB, wasted = 11.739 MB, total = 102.385 MB, #bytes/prim = 78.11
total : used = 102.385 MB, free = 1.247 MB, wasted = 0.009 MB, total = 103.641 MB, #bytes/prim = 79.07
4K : used = 0.000 MB, free = 0.000 MB, wasted = 0.000 MB, total = 0.000 MB, #bytes/prim = 0.00
2M : used = 0.000 MB, free = 0.000 MB, wasted = 0.000 MB, total = 0.000 MB, #bytes/prim = 0.00
malloc: used = 102.385 MB, free = 1.247 MB, wasted = 0.009 MB, total = 103.641 MB, #bytes/prim = 79.07
shared: used = 0.000 MB, free = 0.000 MB, wasted = 0.000 MB, total = 0.000 MB, #bytes/prim = 0.00
created scene intersector
accels[0]
intersector1 = avx2::BVH4VirtualIntersector1
intersector4 = avx2::BVH4VirtualIntersector4Chunk
intersector8 = avx2::BVH4VirtualIntersector8Chunk
intersectorN = avx2::BVH4VirtualIntersectorStream
selected scene intersector
intersector1 = avx2::BVH4VirtualIntersector1
intersector4 = avx2::BVH4VirtualIntersector4Chunk
intersector8 = avx2::BVH4VirtualIntersector8Chunk
intersectorN = avx2::BVH4VirtualIntersectorStream
Versuche dich grad zu errrichen :-)
Sent from my iPhone
On Jun 5, 2018, at 3:44 PM, Ali Can Demiralp notifications@github.com wrote:
I enabled logging to see whether smooth curves were enabled, and they seem disabled. I will add logging tomorrow since I currently do not have the source build on this computer. But from reading code, its either :
Geometry::finalize(model); or
// XXX curves may actually have a larger bounding box due to swinging for (uint32_t i = 0; i < numSegments; i++) { const uint32 idx = index[i]; bounds.extend(vertex[idx] - radius[idx]); bounds.extend(vertex[idx] + radius[idx]); bounds.extend(vertex[idx+1] - radius[idx+1]); bounds.extend(vertex[idx+1] + radius[idx+1]); } or
ispc::StreamLines_set(getIE(),model->getIE(), globalRadius, (const ispc::vec3fa*)vertex, numVertices, index, numSegments, color); Log for "curve: 0" (this is recorded on a much weaker computer and smaller data than the original example):
Generating streamlines. Running Ospray.
Embree Ray Tracing Kernels 3.1.0 (b1bdaa246c4d52a517a04d022b801902c555de03) Compiler : Intel Compiler 17.0.1 Build : Release Platform : Windows (64bit) CPU : Haswell (GenuineIntel) Threads : 4 ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 MXCSR : FTZ=1, DAZ=1 Config Threads : 1 ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 (supported) SSE2 SSE4.2 AVX AVX2 AVX512SKX (compile time enabled) Features: intersection_filter Tasking : TBB2017.0 TBB_header_interface_9100 TBB_lib_interface_10001
general: build threads = 1 start_threads = 0 affinity = 0 hugepages = disabled verbosity = 2 cache_size = 134.218 MB max_spatial_split_replications = 2 triangles: accel = default builder = default traverser = default motion blur triangles: accel = default builder = default traverser = default quads: accel = default builder = default traverser = default motion blur quads: accel = default builder = default traverser = default line segments: accel = default builder = default traverser = default motion blur line segments: accel = default builder = default traverser = default hair: accel = default builder = default traverser = default motion blur hair: accel = default builder = default traverser = default subdivision surfaces: accel = default grids: accel = default builder = default motion blur grids: accel = default builder = default object_accel: min_leaf_size = 1 max_leaf_size = 1 object_accel_mb: min_leaf_size = 1 max_leaf_size = 1
ospray: trying to look up renderer type 'scivis' for the first time
ospray: trying to look up geometry type 'streamlines' for the first time
======================================================= Finalizing model, has 1 geometries and 0 volumes
Finalizing geometry 0
osp: creating streamlines geometry, #verts=1441792, #segments=1310720, as curve: 0
segments: 0
triangles: 0 quads: 0 subdivs: 0 usergeom: 1310720 flat_linear_curve: 0 round_linear_curve: 0
oriented_linear_curve: 0 flat_bezier_curve: 0 round_bezier_curve: 0 oriented_bezier_curve: 0 flat_bspline_curve: 0 round_bspline_curve: 0 oriented_bspline_curve: 0 instance: 0 grid: 0 building BVH4
Hello,
We are using Ospray v1.4.3, built with Embree 2.17.2 and TBB 4.4.
We have conducted several scaling tests, and while rendering seems to scale with the number of cores correctly, the BVH construction after a geometry->commit() seems to be constant.
Here are the numbers: