After trying NvFlexGetNeighbors, I've quickly seen that mapping (NvFlexBuffer* neighbors) is quite a high load. I have lowered maxParticleNeighborsto around 32 for fluids before seeing instabilities, but the amount of data that need to be transferred is still a bit too high. It seems like that the access pattern of this vector isneighbors[c*maxParticles+ offset], so if I can accept only accessing the first N neighbours I only need the contiguous set from 0 til N*maxParticles. So there is a large potential for optimization here if N is a lot lower than maxParticleNeighbors.
There is no NvFlexCopyDescfor this functions, so I have tried to use the undocumented
NvFlexCopyDeviceToHost
and also the d3d11 context
NvFlex::Context->copyToHost
and
NvFlex::Context->download
functions without any success. My best guess is to call these functions is the solver-loop in the following manner
void** context = new void*;
void** device = new void*;
NvFlexGetDeviceAndContext(flexLib, device, context);
d3Context = reinterpret_cast<NvFlex::Context*>(*context);
Then I map both the device and host buffer and access the host buffer. The sad part is that this does not work, as all the elements in the host buffer are 0. Any tips, or hints on what I'm doing wrong?
After trying
NvFlexGetNeighbors
, I've quickly seen that mapping (NvFlexBuffer* neighbors
) is quite a high load. I have loweredmaxParticleNeighbors
to around 32 for fluids before seeing instabilities, but the amount of data that need to be transferred is still a bit too high. It seems like that the access pattern of this vector isneighbors[c*maxParticles+ offset]
, so if I can accept only accessing the first N neighbours I only need the contiguous set from 0 til N*maxParticles. So there is a large potential for optimization here if N is a lot lower thanmaxParticleNeighbors
.There is no
NvFlexCopyDesc
for this functions, so I have tried to use the undocumentedNvFlexCopyDeviceToHost
and also the d3d11 contextNvFlex::Context->copyToHost
andNvFlex::Context->download
functions without any success. My best guess is to call these functions is the solver-loop in the following mannerwhere the buffers are
NvFlexVector
types:and d3Context is
Then I map both the device and host buffer and access the host buffer. The sad part is that this does not work, as all the elements in the host buffer are 0. Any tips, or hints on what I'm doing wrong?