LuxCoreRender / BlendLuxCore

Blender Integration for LuxCore
GNU General Public License v3.0
731 stars 92 forks source link

GPU mode is almost unusable #527

Open kimagin opened 4 years ago

kimagin commented 4 years ago

When I try to switch the device from CPU to GPU the renderer becomes extremely sluggish. It is extremely hard to even move a single object or zoom the camera. I have a GTX 1080 ti and I've tried to switch to openCL and CUDA in Luxcore preferences. When I switch the device to CPU, everything immediately gets solved and the interface becomes very responsive.

Thanks for the amazing renderer engine.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/92172663-gpu-mode-is-almost-unusable?utm_campaign=plugin&utm_content=tracker%2F80143047&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F80143047&utm_medium=issues&utm_source=github).
kimagin commented 4 years ago

Now I'm receiving this error while I'm using CPU mode : OpenCL driver API error(code: -5, file:D\a\1\Luxcore\src\luxrays\devices\ocldevice.cpp,line:118): CL_OUT_OF_RESOURCES

It doesn't preview the viewport renderer.

Dade916 commented 4 years ago

Does it happen in all scenes ? Even with the initial Blender cube ?

What LuxCore version are you using ?

kimagin commented 4 years ago

When there is a simple cube it works fine. As soon as I add more geometries, for example, a Suzzan with 2 levels of subdivision it begins the lagging. When I add materials ( Disney material) and I try to change the colors it lags a lot more. The moment when I switch it to CPU, it works extremely responsive. I am using blender 2.83 with LuxCore version 2.4 ( the last regular update which you recently published)

Theverat commented 4 years ago

Maybe it helps track this down if you upload an example scene that is sluggish for you. If you can make a video recording of the problem, it would also be very helpful.

kimagin commented 4 years ago

I've prepared a very simple scene including 4 Suzzans with 2 subdivision surface levels on each and a simple HDRI. There are no materials and almost everything is on default. as you can see in the film when I'm on GPU it takes long intervals to manipulate the HDRI angle, when I'm clicking an object it takes almost 500ms to select that object and when I move the objects it is not very fluid. For comparison, I've done the same with the GPU and everything is extremely responsive.

Here is the link to the video : https://drive.google.com/file/d/1HZyJqMzy_OP7yGyKO97S-mClSQiTBhat/view?usp=sharing

Here is the link to the scene : https://drive.google.com/file/d/1IZWxbYK9oXyhdSNrHjGK-SXszT5U_ARx/view?usp=sharing

I'm attaching the test scene. The HDRI which I've packed into that is from hdrihaven.com.

P.S I love lux core render and I'm doing a big amount of rendering in my studio with this fantastic renderer. Thank you for your great work.

Theverat commented 4 years ago

I opened the scene in latest v2.5 (41b2f0112afe679e6b5469092e8069d541c3b0fe), disabled viewport denoising, selected 3 monkeys, and moved them around with G. Checking the console, I get the following logs:

CPU, Ryzen 7 2700x:

view_update(): checking for changes took 0.0 ms
[Exporter] Update because of: OBJECT | REQUIRES_SCENE_EDIT
[SDL][63.047] Scene objects count: 3
[LuxRays][63.047] Preprocessing DataSet
[LuxRays][63.047] Total vertex count: 755896
[LuxRays][63.047] Total triangle count: 252094
[LuxRays][63.047] Preprocessing DataSet done
[LuxRays][63.047] Adding DataSet accelerator: EMBREE
[LuxRays][63.047] Total vertex count: 755896
[LuxRays][63.047] Total triangle count: 252094
[LuxRays][63.062] EmbreeAccel build time: 14ms
view_update(): applying changes took 15.8 ms
view_update() took 15.8 ms

GPU, RTX 2080, CUDA backend:

view_update(): checking for changes took 0.0 ms
[Exporter] Update because of: OBJECT | REQUIRES_SCENE_EDIT
[SDL][342.781] Scene objects count: 3
[LuxRays][342.781] Preprocessing DataSet
[LuxRays][342.781] Total vertex count: 755896
[LuxRays][342.781] Total triangle count: 252094
[LuxRays][342.781] Preprocessing DataSet done
[LuxRays][342.781] Adding DataSet accelerator: MBVH
[LuxRays][342.781] Total vertex count: 755896
[LuxRays][342.781] Total triangle count: 252094
[LuxRays][342.781] Building Multilevel Bounding Volume Hierarchy: 6 leafs
[LuxRays][342.781] BVH Dataset preprocessing time: 0ms
[LuxRays][342.781] BVH builder: EMBREE_BINNED_SAH
[LuxRays][342.797] BVH build hierarchy time: 16ms
[LuxRays][342.797] BVH total build time: 16ms
[LuxRays][342.797] Total BVH memory usage: 8Kbytes
[LuxRays][342.797] BVH Dataset preprocessing time: 0ms
[LuxRays][342.797] BVH builder: EMBREE_BINNED_SAH
[LuxRays][342.797] BVH build hierarchy time: 0ms
[LuxRays][342.797] BVH total build time: 0ms
[LuxRays][342.797] Total BVH memory usage: 0Kbytes
[LuxRays][342.797] BVH Dataset preprocessing time: 0ms
[LuxRays][342.797] BVH builder: EMBREE_BINNED_SAH
[LuxRays][342.812] BVH build hierarchy time: 14ms
[LuxRays][342.812] BVH total build time: 14ms
[LuxRays][342.812] Total BVH memory usage: 2948Kbytes
[LuxRays][342.812] BVH Dataset preprocessing time: 0ms
[LuxRays][342.812] BVH builder: EMBREE_BINNED_SAH
[LuxRays][342.828] BVH build hierarchy time: 16ms
[LuxRays][342.828] BVH total build time: 16ms
[LuxRays][342.828] Total BVH memory usage: 2948Kbytes
[LuxRays][342.828] BVH Dataset preprocessing time: 0ms
[LuxRays][342.828] BVH builder: EMBREE_BINNED_SAH
[LuxRays][342.844] BVH build hierarchy time: 16ms
[LuxRays][342.844] BVH total build time: 16ms
[LuxRays][342.844] Total BVH memory usage: 2948Kbytes
[LuxRays][342.844] BVH Dataset preprocessing time: 0ms
[LuxRays][342.844] BVH builder: EMBREE_BINNED_SAH
[LuxRays][342.859] BVH build hierarchy time: 15ms
[LuxRays][342.859] BVH total build time: 15ms
[LuxRays][342.859] Total BVH memory usage: 2948Kbytes
[LuxRays][342.859] Building Multilevel Bounding Volume Hierarchy root tree
[LuxRays][342.859] MBVH root tree builder: EMBREE_BINNED_SAH
[LuxRays][342.859] MBVH build time: 78ms
[LuxRays][342.859] Total Multilevel BVH memory usage: 11802Kbytes
[LuxRays][342.859] [Device GeForce RTX 2080 CUDAIntersect] MBVH mesh vertices buffer size: 8858Kbytes
[LuxRays][342.875] [Device GeForce RTX 2080 CUDAIntersect] MBVH nodes buffer size: 11802Kbytes
[LuxRays][342.875] [Device GeForce RTX 2080 CUDAIntersect] MBVH leaf transformations buffer size: 384bytes
[LuxRays][342.875] [MBVHKernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][342.875] [MBVHKernel] Compiling kernels
[LuxRays][342.875] [MBVHKernel] Program cached
[LuxRays][342.875] Adding DataSet accelerator: EMBREE
[LuxRays][342.875] Total vertex count: 755896
[LuxRays][342.875] Total triangle count: 252094
[LuxRays][342.890] EmbreeAccel build time: 15ms
[LuxCore][342.890] Compile Geometry
[LuxCore][342.890] Scene geometry compilation time: 0ms
[LuxCore][342.890] Compile Lights
[LuxCore][342.969] Lights compilation time: 78ms
view_update(): applying changes took 209.6 ms
view_update() took 209.6 ms

Some observations:

Side note: switching to OptiX brings the total update time down to around 140 ms, due to faster BVH building. Still not great though.

Dade916 commented 4 years ago

For some reason, the scene update for the GPU recompiles the lights (78 ms!), which is not done for the CPU

Any geometry edit requires a re-compilation of lights because of mesh lights (i.e. the area of a triangle could have been changed and it would affect emitted light, etc.).

GPU editing will always slower than CPU editing because data must be converted and transferred from CPU to GPU trough the relatively slow PCIe. This is exactly the reason why we were using only CPU view port rendering in the past.

What is the size of the HDR ? It could heavily affect the result of this test if it is several MBs.

Theverat commented 4 years ago

Any geometry edit requires a re-compilation of lights because of mesh lights (i.e. the area of a triangle could have been changed and it would affect emitted light, etc.).

I wonder why it doesn't happen on the CPU, then.

GPU editing will always slower than CPU editing because data must be converted and transferred from CPU to GPU trough the relatively slow PCIe. This is exactly the reason why we were using only CPU view port rendering in the past.

Some questions:

What is the size of the HDR ? It could heavily affect the result of this test if it is several MBs.

4096 x 2048. Removing the HDR saves the 78 ms for light compilation, so there's still a difference of ~16 ms CPU vs. ~130 ms GPU.

By the way, when I do the same test with Cycles (including HDRI), the response is extremely fluid, even on the GPU. Objects lag behind when moved around, obviously, but the whole interface stays responsive the whole time. I wonder how they do this. It could be that the updates are done asynchronically (but I doubt it a bit). Or maybe they have a BVH that doesn't need rebuilding when objects are moved? But there are probably other reasons as well.

Dade916 commented 4 years ago

I wonder why it doesn't happen on the CPU, then.

The PATHCPU does literally nothing. PATHOCL executes:

https://github.com/LuxCoreRender/LuxCore/blob/41b2f0112afe679e6b5469092e8069d541c3b0fe/src/slg/engines/pathoclbase/compilelights.cpp#L298

https://github.com/LuxCoreRender/LuxCore/blob/41b2f0112afe679e6b5469092e8069d541c3b0fe/src/slg/engines/pathoclbase/compilegeometry.cpp#L38

It is like you assume CPU and GPU use the same data: CPU has several structures and pointers to structures while GPU data can not use pointers at all, everything is packed together (i.e. "compiled") and transferred.

  • 78 ms building several BVHs with EMBREE_BINNED_SAH - this looks to me like it's done before sending any data to the GPU, is that correct?

MBVH is a BVH of BVHs: a BVH for each instance is built and than a BVH of all BVH is built too.

4096 x 2048. Removing the HDR saves the 78 ms for light compilation, so there's still a difference of ~16 ms CPU vs. ~130 ms GPU.

It is nearly 50% of the time. Again, you seems to assume CPU and GPU require the same setup work while they are universe apart.

Theverat commented 4 years ago

Thanks for these explanations and insights. I would be interested in your thoughts about how Cycles achieves its very fluid scene updates on the GPU.

jiabaoyu commented 4 years ago

Me also curious about how cycles achieve that fluent If I turn the pixel size to 2X in viewport render setting, however big the viewport is, it's just as responsive as cycles. Here is a demo https://streamable.com/bzhtcv