Re-use memory for multiple message lists

ptheywood commented 4 years ago

One issue with large FLAME GPU 1 models is memory usage, which limits the scale of models achievable, including agent data and message data.

FLAME GPU 2 improves on this with dynamic buffer sizes, but could be further improved.

It should be possible to reduce the memory footprint of (some) models by re-using the same memory for multiple message lists, where messages do not exist concurrently.

I.e. in a single agent system with two message listsa and b:

   |
   V
output_a --
   |      |
   |      a
   V      |
input_a <--
   |
   V
output_b --
   |      |
   |      b
   V      |
input_b <--
   |
   V

In this case, (if/where messages do not persist between iterations) the same global memory could be used for both message lists (where the size of the total memory is the larger of the size of each message list.

This would allow larger models to run on a single device than without (for cases where this is possible).

This would however require solid dependency analysis (if it is actually achievable with user-provided agent functions).

This will only wotk with non-persistent message lists (i.e. message lists for which the contents do not exist for the next iteration)

Robadob commented 4 years ago

If this were to be implemented, it would probably be worth completely redesigning CUDAAgentStateList and CUDAMessageList to better facilitate shared stuff (disabled agents and how scans are handled),. Currently the necessary handling of all (primarily for agents, rather than messages) of this is a bit messy and easy to create bugs if extending. Furthermore, submodels introduces a need for similar sharing of ownership. I don't have a clear idea of how it could be better restructured at this point though.

On Wed, 22 Apr 2020 at 13:36, Peter Heywood notifications@github.com wrote:

One issue with large FLAME GPU 1 models is memory usage, which limits the scale of models achievable, including agent data and message data.

FLAME GPU 2 improves on this with dynamic buffer sizes, but could be further improved.

It should be possible to reduce the memory footprint of (some) models by re-using the same memory for multiple message lists, where messages do not exist concurrently.

I.e. in a single agent system with two message listsa and b:

V output_a --

a

V

input_a <-- V output_b --

b

V

input_b <-- | V

In this case, (if/where messages do not persist between iterations) the same global memory could be used for both message lists (where the size of the total memory is the larger of the size of each message list.

This would allow larger models to run on a single device than without (for cases where this is possible).

This would however require solid dependency analysis (if it is actually achievable with user-provided agent functions).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FLAMEGPU/FLAMEGPU2_dev/issues/228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVGCW6ILBMRPV24NJO4OLRN3QFRANCNFSM4MOEDFVQ .

V output_a --
a
V

input_a <--	V output_b --
b
V

Robadob commented 4 years ago

With the submodels refactor of CUDAAgent, subagents all shared the same buffer space for agent birth (as do states).

As discussed in #242, messages don't need swap buffers, they could similarly all share output buffers in a similar way to agent buffer. We could even merge agent birth buffers with new message buffers, so that they are all shared. Doing this at the CUDAAgentModel scope would also better enable us to allocate these buffers (so that the largest need gets the largest buffer, rather than needlessly resizing the smallest due to order they're handed out).

Another 'fake' singleton for buffers would suffice. Could strip the code out of refactored CUDAAgent without too much trouble.

Extending this for messagelists which don't persist should be trivial.

FLAMEGPU / FLAMEGPU2

Re-use memory for multiple message lists #228