Closed h3mosphere closed 4 months ago
Hi @h3mosphere,
you can do this quite easily: once you have copied the data from VRAM to RAM with read_from_device()
, you can spawn a new detached thread on the CPU side and immediately continue the simulation on GPU.
This detached thread then can take its time to do the memory copy and do other processing on the data. You only have to wait for it to finish the memory copy (use std::atomic_int
variables here) before the next read_from_device()
call happens and overwrites the original CPU data.
This is very similar to how I'm already doing the .png
image export: here the read_from_device()
+ one CPU copy happen sequentially (the pause is really short, as the data is only a few MB), and then I spawn a detached thread for .png
compression, which takes much longer. If you export a lot of 4K images in rapid succession, you'll notice CPU load going to 100%, when all cores are busy doing .png
compression, each on one image.
Kind regards, Moritz
Hello,
First up, thank you for this fantastic software! It is quite amazing what it is capable of.
To my problem: I am looking to quickly copy the memory buffers, for the various simulation values into memory, for subsequent processing in a different thread. Currently I have the following to export TYPE_F as particles (for further processing)
This however adds a fair pause in the simulation.
The basic problem, is to get the relevant data out of LBM as quickly as possible, so it can continue with it's simulation.
I was first wondering if it is possible to copy the internal memory buffer (or get a reference to it), however it also occurred to me that it may make more sense to: a) have a function to transfer the data directly from the domain devices, into a memory location, which is not contained within the LBM/Memory_Container objects. b) have the ability to 'detach' the memory buffers/Memory_Container's, and return them directly (one per domain), for further processing, and freeing in another thread.
This has the somewhat tricky problem however, looking at the Memory_Container[] index operator, and reference() functions, this has to take into account multiple LBM_Domains, when used, and interleave them appropriately. This however could be mitigated by stacking the domains in only direction (Z?), so the memory is naturally ordered linearly.
I hope this makes sense, your thoughts are much appreciated.