POV-Ray / povray

The Persistence of Vision Raytracer: http://www.povray.org/
GNU Affero General Public License v3.0
1.35k stars 282 forks source link

Feature request: Render_Pattern to maximize multicore utilization with animations #389

Open mlsomers opened 4 years ago

mlsomers commented 4 years ago

Summary

When rendering a scene there are some parts of the image that take longer than others, the difference can be quite significant. The Render_Pattern option can help ordering these spots to some extent, but there can still be a significant amount of time while the last block is rendered by a single thread, leaving the other threads dormant.

Workaround

The Render_Block_Size can be made smaller to reduce the amount of render time using a single thread at the end of each frame, but will have a negative performance impact overall due to more overhead in job queue and messages between back-end and front-end (UI).

Optimizing the block-size ratio by experimenting and benchmarking can be quite tedious and the optimal settings will usually change during the progress of an animation.

Suggested Solution

For animations, introduce a new Render_Pattern option that will keep track of how long each block takes to render. Then when rendering the next frame, order the work so the longest blocks are processed first, and the easiest last. That way all cores will be in use for a longer period. Keep doing the same through the whole animation since the hot-spots are likely to gradually move from frame to frame. For still images, the same feature could make use of a file similar to a photon-map, but I think the main use will be primarily for animations, especially when they take weeks to render.

Alternative (additional) Solution

Decrease the Render_Block_Size dynamically when threads become dormant near the end of a frame-render. I guess this will be much more difficult to implement and would involve extra communication or static flag checking in the render thread after each pixel.

A more "dumb" variation could be to halve/devise the block size on the last [number of threads] blocks, but that would not be effective when the easier (faster) blocks are among the last to be rendered. It would help for scenes with a more uniform distribution of render difficulty though.

c-lipka commented 3 years ago

The radiosity pretraces could also be co-opted to gather performance data. They already usually come in multiple passes with increasing resolution.

I'd have to look how the regular mosaic pretrace is implemented, but it might also be suited for that job.