RenderKit / embree

Embree ray tracing kernels repository.
Apache License 2.0
2.32k stars 383 forks source link

`rtcOccluded1M` is 3x slower than `rtcIntersect1` #419

Closed Beastmaster closed 1 year ago

Beastmaster commented 1 year ago

I'm comparing the 2 ways of ray tracing. But I found rtcOccluded1M is 3x slower than rtcIntersect1.

rtcIntersect1 code

autox::Grid<RTCRayHit> laser_beams_;

#pragma omp parallel for
  for (size_t i = 0; i < laser_streams_.size(); ++i) {
    const auto& laser = laser_beams_[i];
    RTCRay* ray = &laser_streams_[i].ray;
    RTCHit* hit = &laser_streams_[i].hit;
    ray->org_x = origin(0);
    ray->org_y = origin(1);
    ray->org_z = origin(2);
    ray->dir_x = direction(0);
    ray->dir_y = direction(1);
    ray->dir_z = direction(2);
    ray->tnear = min_distance_;
    ray->tfar = max_distance_;
    ray->time = kMotionTime;
    // reset hit. Following official guide
    hit->primID = RTC_INVALID_GEOMETRY_ID;
    hit->geomID = RTC_INVALID_GEOMETRY_ID;
    hit->instID[0] = RTC_INVALID_GEOMETRY_ID;
    RTCIntersectContext context;
    rtcInitIntersectContext(&context);
    rtcIntersect1(scene.GetScene(), &context, &laser_streams_[i]);
  }

rtcOccluded1M code

  // #pragma omp parallel for
  std::unordered_map<int, int> stream_range_id_map;
  int N = 0;
  for (size_t i = 0; i < laser_streams_.size(); ++i) {
    RTCRay* ray = &laser_streams_[N].ray;
    RTCHit* hit = &laser_streams_[N].hit;
    ray->org_x = origin(0);
    ray->org_y = origin(1);
    ray->org_z = origin(2);
    ray->dir_x = direction(0);
    ray->dir_y = direction(1);
    ray->dir_z = direction(2);
    ray->tnear = min_distance_;
    ray->tfar = max_distance_;
    ray->time = kMotionTime;
    // reset hit. Following official guide
    hit->primID = RTC_INVALID_GEOMETRY_ID;
    hit->geomID = RTC_INVALID_GEOMETRY_ID;
    hit->instID[0] = RTC_INVALID_GEOMETRY_ID;
    stream_range_id_map[i] = N;
    N++;
  }

  auto trace_start = std::chrono::steady_clock::now();
  // ray tracing in stream mode
  RTCIntersectContext context;
  rtcInitIntersectContext(&context);
  context.flags = RTC_INTERSECT_CONTEXT_FLAG_COHERENT;
  rtcIntersect1M(scene.GetScene(), &context, (RTCRayHit*)laser_streams_.data(),
                 N, sizeof(RTCRayHit));

Did I miss anything?

Thank you!

svenwoop commented 1 year ago

You should set the RTC_INTERSECT_CONTEXT_FLAG_COHERENT flag only when the rays are really coherent, thus for primary rays typically. Can you try removing that RTC_INTERSECT_CONTEXT_FLAG_COHERENT flag for the rtcIntersect1M invokation?

Beastmaster commented 1 year ago

I removed "RTC_INTERSECT_CONTEXT_FLAG_COHERENT" but latency were similar. I may missunderstand the meaning of "coherent" because my rays are scattering. I tried to group "coherent" rays and did see some speed-up. Thanks very much!