TornadoVM fully manages device memory, and the way it works is similar to the Java memory management. TornadoVM has a hard limit for the maximum amount of device memory to use. Then, the TornadoVM runtime can allocate as many buffers in that region. Thus, the memory used expands until the maximum limit is reach.
Besides, TornadoVM maintains a list of free and used buffers. Thus, when an execution plan finishes, device buffers are marked as free, but never released (e.g., clMemFree in OpenCL), but rather declare as free for other task-graphs to use the already allocated areas. In the case compaction is needed, TornadoVM deallocs and allocs a new consecutive region.
This whole process is fully transparent for the programmer.
However, it might be cases in which programmers would like the TornadoVM runtime to free all resources after an execution plan has finished. This PR adds support for this feature.
If the flag -Dtornado.reuse.device.buffers=False is set, then TornadoVM allocs and deallocs device buffers every time an execution plan is launched. By default, it is set to true (to reuse buffers as much as possible).
Behaviour
To check all JNI calls, including allocations and deallocations, we need to enable the LOG_JNI macro:
Any test with the flag -Dtornado.reuse.device.buffers=false:
$ tornado-test --printKernel --jvm="-Dtornado.reuse.device.buffers=false" -V uk.ac.manchester.tornado.unittests.foundation.TestFloats#testVectorFloatAdd
## all unit-test also are passing
make tests
Description
TornadoVM fully manages device memory, and the way it works is similar to the Java memory management. TornadoVM has a hard limit for the maximum amount of device memory to use. Then, the TornadoVM runtime can allocate as many buffers in that region. Thus, the memory used expands until the maximum limit is reach.
Besides, TornadoVM maintains a list of free and used buffers. Thus, when an execution plan finishes, device buffers are marked as free, but never released (e.g.,
clMemFree
in OpenCL), but rather declare as free for other task-graphs to use the already allocated areas. In the case compaction is needed, TornadoVM deallocs and allocs a new consecutive region. This whole process is fully transparent for the programmer.However, it might be cases in which programmers would like the TornadoVM runtime to free all resources after an execution plan has finished. This PR adds support for this feature.
If the flag
-Dtornado.reuse.device.buffers=False
is set, then TornadoVM allocs and deallocs device buffers every time an execution plan is launched. By default, it is set totrue
(to reuse buffers as much as possible).Behaviour
To check all JNI calls, including allocations and deallocations, we need to enable the LOG_JNI macro:
// OpenCL
Level Zero:
PTX:
Problem description
n/ a.
Backend/s tested
Mark the backends affected by this PR.
OS tested
Mark the OS where this PR is tested.
Did you check on FPGAs?
If it is applicable, check your changes on FPGAs.
How to test the new patch?
Any test with the flag
-Dtornado.reuse.device.buffers=false
: