NVIDIAGameWorks / FleX

Other
668 stars 100 forks source link

NvFlexGetNeighbors - optimizations #112

Open VidveiLarsen opened 3 years ago

VidveiLarsen commented 3 years ago

After trying NvFlexGetNeighbors, I've quickly seen that mapping (NvFlexBuffer* neighbors) is quite a high load. I have lowered maxParticleNeighborsto around 32 for fluids before seeing instabilities, but the amount of data that need to be transferred is still a bit too high. It seems like that the access pattern of this vector isneighbors[c*maxParticles+ offset], so if I can accept only accessing the first N neighbours I only need the contiguous set from 0 til N*maxParticles. So there is a large potential for optimization here if N is a lot lower than maxParticleNeighbors.

There is no NvFlexCopyDescfor this functions, so I have tried to use the undocumented

NvFlexCopyDeviceToHost and also the d3d11 context NvFlex::Context->copyToHost and NvFlex::Context->download functions without any success. My best guess is to call these functions is the solver-loop in the following manner

NvFlexUpdateSolver(solver);
NvFlexExtPullFromDevice(container);

NvFlexGetNeighbors(solver, NeighborsBufferHost.buffer, CountsBuffer.buffer, ApiToInternalBuffer.buffer, InternalToApiBuffer.buffer);

//NvFlexCopyDeviceToHost(solver, NeighborsBufferDevice.buffer, NeighborsBufferHost.buffer, MaxParticles*NeighborsCopiedToHost, 1);

d3Context->download(reinterpret_cast<NvFlex::Buffer*>(NeighborsBufferDevice.buffer), 0, MaxParticles * 4*NeighborsCopiedToHost);
d3Context->copyToHost(reinterpret_cast<NvFlex::Buffer*>(NeighborsBufferHost.buffer), 0, reinterpret_cast<NvFlex::Buffer*>(NeighborsBufferDevice.buffer), 0, MaxParticles * 4*NeighborsCopiedToHost);

where the buffers are NvFlexVectortypes:

NeighborsBufferDevice(flexLib,0, eNvFlexBufferDevice),
NeighborsBufferHost(flexLib),

and d3Context is

void** context = new void*;
void** device = new void*;
NvFlexGetDeviceAndContext(flexLib, device, context);
d3Context = reinterpret_cast<NvFlex::Context*>(*context);

Then I map both the device and host buffer and access the host buffer. The sad part is that this does not work, as all the elements in the host buffer are 0. Any tips, or hints on what I'm doing wrong?