ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.77k stars 300 forks source link

Initialization sends unnecessary amounts of data to device #126

Closed jansol closed 9 months ago

jansol commented 9 months ago

I noticed there is a lot of host->device traffic during init. This is likely generated by Memory constructors that initialize buffers to a constant value.

Some very noticeable speedups to FluidX3D startup could be achieved by using clEnqueueFillBuffer if the size of the type of a buffer element is one of { 1, 2, 4, 8, 16, 32, 64, 128 } (the legal sizes for the pattern passed to that function). Similarly for the host copy std::fill may be faster than the currently used for loop as it can potentially use optimized compiler intrinsics.

ProjectPhysX commented 9 months ago

Hi Jan,

thank you very much for these suggestions!

I tried a simple case, 3x 2GB allocation:

Overall 2.4x faster. Amazing!

In FluidX3D with the largest buffer allocations being GPU-only, and additional initialization and sanity check loops, the overall speedup is only in the order of ~8%.

I have committed this to the OpenCL-Wrapper and FluidX3D.

Kind regards, Moritz