RenderKit / embree

Embree ray tracing kernels repository.
Apache License 2.0
2.36k stars 385 forks source link

Stream traversal inconsistent results #149

Closed mathijs727 closed 7 years ago

mathijs727 commented 7 years ago

I am trying to implement Embree in a production renderer. One of the desired features is lazy loading, which I cant get to work correctly in Embree stream traversal mode.

What I noticed during my debugging process is that the stream traverser seems to return slightly different results every time (when running the same executable multiple times in single threaded mode). I was able to confirm this problem by copying the rays, executing rtcIntersect() in a for loop and comparing the results (in our renderer).

The problem seems to be easily reproducible by modifying the stream viewer tutorial. I replaced the following code (around line 160 of viewer_stream_device.cpp):

/* trace stream of rays */
#if USE_INTERFACE == 0
  rtcIntersect1M(g_scene,&context,rays,N,sizeof(RTCRay));
#elif USE_INTERFACE == 1
  for (size_t i=0; i<N; i++)
    rtcIntersect(g_scene,rays[i]);
#else
  for (size_t i=0; i<N; i++)
    rtcIntersect1M(g_scene,&context,&rays[i],1,sizeof(RTCRay));
#endif

With:

rtcIntersect1M(g_scene,&context,rays,N,sizeof(RTCRay));

RTCRay raysCopy[TILE_SIZE_X*TILE_SIZE_Y];
std::memcpy(raysCopy, rays, N * sizeof(RTCRay));
for (size_t i=0; i<N; i++)
  rtcIntersect(g_scene,raysCopy[i]);

if (std::memcmp(rays, raysCopy, N * sizeof(RTCRay)) != 0)
  std::cout << "rtcIntersect1M did not return the same values as rtcIntersect" << std::endl;

I also attached the modified file for your convenience: viewer_stream.tar.gz

When you start looking and moving around, the viewer will start to output "rtcIntersect1M did not return the same values as rtcIntersect" indicating that the stream traverser returned something different than the single ray traverser. Replacing the for loop by another call to rtcIntersect1M actually has the same problem, which shows that multiple calls to rtcIntersect1M on a copy of the data will return different results.

Mathijs

cbenthin commented 7 years ago

Hi Mathijs,

you won't get bit-identical results between rtcIntersect1M and rtcIntersect when you have different primitives exactly at the same position or at the same intersection distance. As rtcIntersect1M will probably use a slightly different traversal order through the BVH it could happen that both calls report a different intersection at the same distance.

When looking at your code, you copy the rays to "raysCopy" after you have already traversed the "rays". That means your ray.tfar value is already set and the next rtcIntersect call will only report distances less than ray,tfar, so you will end up with different results. Could you do the comparison between "rays" and "raysCopy" when they are both initialized independently?

Thanks.

mathijs727 commented 7 years ago

Hi,

Thanks for spotting my mistake. I fixed the initialization and now compare the geometry ID of each ray instead of a memcmp.

RTCRay raysCopy[TILE_SIZE_X*TILE_SIZE_Y];
std::memcpy(raysCopy, rays, N * sizeof(RTCRay));

RTCIntersectContext context;
context.flags = g_iflags_coherent;
rtcIntersect1M(g_scene,&context,rays,N,sizeof(RTCRay));

//rtcIntersect1M(g_scene,&context,raysCopy,N,sizeof(RTCRay));
for (size_t i=0; i<N; i++)
  rtcIntersect(g_scene,raysCopy[i]);

for (int i = 0; i < N; i++)
{
  if (rays[i].geomID != raysCopy[i].geomID)
  {
    std::cout << "Ray hit " << rays[i].geomID << " according to intersect stream" << std::endl;
    std::cout << "Ray hit " << raysCopy[i].geomID << " according to intersect single" << std::endl;
    std::cout << std::endl;
  }
}

This still gives a different results between single and stream traversal. Calling the stream traversal on both arrays gives the exact same result (as expected).

I also tried something else: reversing the order of the rays and than using stream traversal on both arrays. Although not frequently (may take about 30 seconds of flying around), the stream traversal on the reversed array will give a different answer.

RTCRay raysCopy[TILE_SIZE_X*TILE_SIZE_Y];
std::reverse_copy(rays, rays+N, std::begin(raysCopy));

RTCIntersectContext context;
context.flags = g_iflags_coherent;
rtcIntersect1M(g_scene,&context,rays,N,sizeof(RTCRay));
rtcIntersect1M(g_scene,&context,raysCopy,N,sizeof(RTCRay));

for (int i = 0; i < N; i++)
{
  if (rays[i].geomID != raysCopy[N - i - 1].geomID)
  {
    std::cout << "Ray hit " << rays[i].geomID << " according to intersect stream" << std::endl;
    std::cout << "Ray hit " << raysCopy[N - i -1].geomID << " according to inverted intersect stream" << std::endl;
    std::cout << std::endl;
  }
}

Both "problems" (single vs stream and stream vs stream) make debugging in combination with multi-threading hard because its impossible to determine whether a difference in the final image is caused by the order in which Embree processes rays, or by the user application.

cbenthin commented 7 years ago

Can you quickly try whether in case of different geomIDs the raysCopy.tfar is actually smaller than rays.tfar? If yes, then I would say that's a bug. Which Embree version/OS/compiler do you use? Would it be possible to get some geometry as reproducer?

Final question. What happens if you set the context.flags to g_iflags.incoherent (invokes different code path)?

Thanks.

mathijs727 commented 7 years ago

Hi,

These are some of the differences that I got:

Ray hit geomID:        3
Ray hit tfar:          5.952744e+02
Copied ray hit geomID: 4294967295
Copied ray hit tfar:   inf

Ray hit geomID:        3
Ray hit tfar:          1.327962e+03
Copied ray hit geomID: 4294967295
Copied ray hit tfar:   inf

Ray hit geomID:        3
Ray hit tfar:          1.112134e+03
Copied ray hit geomID: 4294967295
Copied ray hit tfar:   inf

Ray hit geomID:        3
Ray hit tfar:          9.501797e+02
Copied ray hit geomID: 4294967295
Copied ray hit tfar:   inf

Ray hit geomID:        2
Ray hit tfar:          5.671211e+02
Copied ray hit geomID: 3
Copied ray hit tfar:   5.671212e+02

Ray hit geomID:        2
Ray hit tfar:          5.581539e+02
Copied ray hit geomID: 1
Copied ray hit tfar:   5.581539e+02

Ray hit geomID:        2
Ray hit tfar:          5.780020e+02
Copied ray hit geomID: 1
Copied ray hit tfar:   5.780020e+02

Ray hit geomID:        2
Ray hit tfar:          6.021117e+02
Copied ray hit geomID: 3
Copied ray hit tfar:   6.021116e+02

This is with the Embree stream viewer from the Embree tutorials (comparing streams vs reversed streams) using the Cornell scene that is supplied with the Embree tutorials: viewer_stream.tar.gz

This problem does not seem to occur when I switch to incoherent mode as you suggested.

Im using Embee 2.16.5 compiled with GCC 4.8.5 on RedHat 4.8.5.

Linux version 3.10.0-514.26.2.el7.x86_64 (mockbuild@x86-040.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Jun 30 05:26:04 UTC 2017

This is running on a dual socket Xeon E5-2687W v4 with Hyperthreading disabled.

cbenthin commented 7 years ago

Thanks a lot. I can reproduce the issue (after moving the camera like crazy) which totally baffles me. Until I've figured out what's going on, please set context.flags = RTC_INTERSECT_INCOHERENT. This avoids the ray order dependence at least on my end.

cbenthin commented 7 years ago

OK, I think I found the issue. The stream code path (with the coherent flag enabled) had a traversal order dependence based on the ordering of the rays with the stream. I've checked the fix into the release branch. Could you give it a try?

Thanks.

mathijs727 commented 7 years ago

Thanks for the help.

I downloaded the release branch from github and used the same modified tutorial code to test. It seems that the problem still occurs unfortunately. I tested with both GCC 6.3.1 and ICC 18.0.0.

For me the bug is easiest reproducable by flying around the outside of the Cornell box and looking at the edges and corners.

EDIT: it seems like adding the RTC_SCENE_ROBUST flag fixes the problem in both the new version of embree and the original release of 2.16.5 .

By the way: what is the difference (in terms of algorithms) between RTC_INTERSECT_COHERENT and RTC_INTERSECT_INCOHERENT? And what is the interaction between the scene flag and the intersection context flag for coherency?

cbenthin commented 7 years ago

I cannot reproduce the issue on my end, regardless how much I zoom or turn the Cornell box. Could you add the following to line 748 in tutorial.cpp: PRINT(camera.str()); That will print out the viewer/camera settings per frame. If you get a difference please provide me the camera settings for this frame (including your entire cmd line). That makes it easier to reproduce it on my end.

RTC_SCENE_ROBUST will cause Embree to use a different and slower code path for both traversal and primitive intersection.

RTC_INTERSECT_COHERENT assumes that the rays in the stream are as coherent as primary rays and will therefore use a special ray frustum-based traversal kernel which is the fastest for highly coherent rays.

RTC_INTERSECT_INCOHERENT flags the ray stream to be treated as incoherent rays, which internally falls back to standard single ray-based stream traversal kernel.

The question is why the RTC_INTERSECT_COHERENT still shows the ray-order dependent behaviour on your end....

mathijs727 commented 7 years ago

Hi,

Thanks for the response. Here is some of the output where there is still a difference:

camera.str() = -vp 697.4320679 579.8771362 571.1641846 -vi -81.09967041 507.9751587 401.7946167 -vu 0 1 0 -fov 37
camera.str() = -vp 697.2194824 579.8771362 572.1413574 -vi -81.31224823 507.9751587 402.771759 -vu 0 1 0 -fov 37
Ray hit geomID:        3
Ray hit tfar:          8.117651e+02
Copied ray hit geomID: 4294967295
Copied ray hit tfar:   inf

camera.str() = -vp 697.006897 579.8771362 573.1185303 -vi -81.52482605 507.9751587 403.7489014 -vu 0 1 0 -fov 37
camera.str() = -vp 696.7943115 579.8771362 574.0957031 -vi -81.73740387 507.9751587 404.7260437 -vu 0 1 0 -fov 37
camera.str() = -vp 718.265686 459.2365417 593.3762817 -vi 70.64819336 880.7341309 386.4896851 -vu 0 1 0 -fov 37
camera.str() = -vp 717.4611206 459.7719421 593.1192627 -vi 73.8910141 888.046936 387.5256348 -vu 0 1 0 -fov 37
Ray hit geomID:        4
Ray hit tfar:          1.943637e+02
Copied ray hit geomID: 4294967295
Copied ray hit tfar:   inf

camera.str() = -vp 716.6565552 460.3073425 592.8622437 -vi 73.08647156 888.5823364 387.2686157 -vu 0 1 0 -fov 37
camera.str() = -vp 715.8519897 460.8427429 592.6052246 -vi 72.28192902 889.1177368 387.0115967 -vu 0 1 0 -fov 37
camera.str() = -vp 638.2092896 500.8135071 19.99304008 -vi -35.34186172 852.3326416 270.2886963 -vu 0 1 0 -fov 37
camera.str() = -vp 637.0189209 501.2529602 19.36856651 -vi -36.53220749 852.7720947 269.6642151 -vu 0 1 0 -fov 37
Ray hit geomID:        4
Ray hit tfar:          9.349952e+01
Copied ray hit geomID: 4294967295
Copied ray hit tfar:   inf

camera.str() = -vp 635.8285522 501.6924133 18.74409294 -vi -37.72255325 853.2115479 269.0397339 -vu 0 1 0 -fov 37
camera.str() = -vp 634.6381836 502.1318665 18.11961937 -vi -38.91289902 853.651001 268.4152527 -vu 0 1 0 -fov 37
cbenthin commented 7 years ago

Thanks for the camera settings but unfortunately I still cannot reproduce the issue. Just to double-check could you run: ./viewer_stream -c ../tutorials/models/cornell_box.ecs -vp 697.2194824 579.8771362 572.1413574 -vi -81.31224823 507.9751587 402.771759 -vu 0 1 0 -fov 37 on your end, and verify whether it still produces the issues.

Would it be possible to send an email to embree_support@intel.com so that we can start an email thread on this issue. Makes also sending binaries back and forth easier.

Thanks for your help.

mathijs727 commented 7 years ago

Hi,

I tried running it at that position and it does not give me an error. Actually, most of the start positions that I tried (which gave an error when looking around) work just fine. This start position does consistently cause a problem: ./viewer_stream -c ../tutorials/models/cornell_box.ecs -vp 996.1184692 452.4086914 135.0840454 -vi 526.1708984 392.4828796 779.6512451 -vu 0 1 0 -fov 37

The code is compiled with GCC 6.3.1 .

cbenthin commented 7 years ago

Excellent, I can get reproduce an error on my end at least with clang but not icc. But that's enough and I'm looking into it...

Thanks again.

cbenthin commented 7 years ago

BTW: It seems to be related to the AVX2 code path because when I add -rtcore isa=avx to the command line it seems to work. Interesting...

mathijs727 commented 7 years ago

With GCC 6.3.1 the problem does not occur either when limited to AVX (-DEMBREE_MAX_ISA=AVX).

cbenthin commented 7 years ago

Hmm, I think I've found something. It's actually a slight algorithmic problem of the frustum algorithm we use. Need to investigate a bit more but as a workaround could you replace line 742 in bvh_intersector_stream.h with: size_t m_node = m_node_hit; and try again. Thanks.

cbenthin commented 7 years ago

I've checked in a better bug fix into the release branch. Please give it a try. Thanks.

mathijs727 commented 7 years ago

The fix seems to work!

Thanks for the help