Open ptheywood opened 4 years ago
If this were to be implemented, it would probably be worth
completely redesigning CUDAAgentStateList
and CUDAMessageList
to
better facilitate shared stuff (disabled agents and how scans are
handled),. Currently the necessary handling of all (primarily for agents,
rather than messages) of this is a bit messy and easy to create bugs if
extending. Furthermore, submodels introduces a need for similar sharing of
ownership. I don't have a clear idea of how it could be better restructured
at this point though.
On Wed, 22 Apr 2020 at 13:36, Peter Heywood notifications@github.com wrote:
One issue with large FLAME GPU 1 models is memory usage, which limits the scale of models achievable, including agent data and message data.
FLAME GPU 2 improves on this with dynamic buffer sizes, but could be further improved.
It should be possible to reduce the memory footprint of (some) models by re-using the same memory for multiple message lists, where messages do not exist concurrently.
I.e. in a single agent system with two message listsa and b:
V output_a -- a V
input_a <-- V output_b -- b V input_b <-- | V
In this case, (if/where messages do not persist between iterations) the same global memory could be used for both message lists (where the size of the total memory is the larger of the size of each message list.
This would allow larger models to run on a single device than without (for cases where this is possible).
This would however require solid dependency analysis (if it is actually achievable with user-provided agent functions).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FLAMEGPU/FLAMEGPU2_dev/issues/228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVGCW6ILBMRPV24NJO4OLRN3QFRANCNFSM4MOEDFVQ .
With the submodels refactor of CUDAAgent
, subagents all shared the same buffer space for agent birth (as do states).
As discussed in #242, messages don't need swap buffers, they could similarly all share output buffers in a similar way to agent buffer. We could even merge agent birth buffers with new message buffers, so that they are all shared. Doing this at the CUDAAgentModel
scope would also better enable us to allocate these buffers (so that the largest need gets the largest buffer, rather than needlessly resizing the smallest due to order they're handed out).
Another 'fake' singleton for buffers would suffice. Could strip the code out of refactored CUDAAgent
without too much trouble.
Extending this for messagelists which don't persist should be trivial.
One issue with large FLAME GPU 1 models is memory usage, which limits the scale of models achievable, including agent data and message data.
FLAME GPU 2 improves on this with dynamic buffer sizes, but could be further improved.
It should be possible to reduce the memory footprint of (some) models by re-using the same memory for multiple message lists, where messages do not exist concurrently.
I.e. in a single agent system with two message lists
a
andb
:In this case, (if/where messages do not persist between iterations) the same global memory could be used for both message lists (where the size of the total memory is the larger of the size of each message list.
This would allow larger models to run on a single device than without (for cases where this is possible).
This would however require solid dependency analysis (if it is actually achievable with user-provided agent functions).
This will only wotk with non-persistent message lists (i.e. message lists for which the contents do not exist for the next iteration)