beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.16k stars 109 forks source link

Fix for Device Events (OpenCL and SPIR-V events) when running Multi-threaded Execution plans #387

Closed jjfumero closed 2 months ago

jjfumero commented 2 months ago

Description

This PR fixes the issue of sharing the eventPool associated to a backend-context within the TornadoVM runtime.

This PR also updates one of the methods to reset all internal state associated to one backend.

- driver.getDefaultDevice().reset();
+ driver.getDefaultDevice().clean();

This resets the code cache and all events associated to a command queue. This method can be also reached from the TornadoExecutionPlan:

executionPlan.resetDevice();

Problem description

When running Tornado execution plans from many different Java threads, there was the possibility of accessing an internal data structure from many threads. This internal data structure holds the pointers to the associated low-level events in a list. However, this list is maintained by the <Backend>DeviceContext, which must be shared for all threads running on the same device.

This PR extends the previous work to achieve a single command queue/ or stream, per Java thread.

Backend/s tested

Mark the backends affected by this PR.

OS tested

Mark the OS where this PR is tested.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

How to test the new patch?

$ make
$ make tests
jjfumero commented 2 months ago

In OpenCL, I get the following error:

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans"
WARNING: Using incubator modules: jdk.incubator.vector
Running thread t0Running thread t1Exception in thread "Thread-206" Exception in thread "Thread-205" uk.ac.manchester.tornado.api.exceptions.TornadoOutOfMemoryException: Unable to allocate 265814040 bytes of memory.
  at tornado.drivers.common@1.0.4-dev/uk.ac.manchester.tornado.drivers.common.TornadoBufferProvider.freeUnusedNativeBufferAndAssignRegion(TornadoBufferProvider.java:126)
  at tornado.drivers.common@1.0.4-dev/uk.ac.manchester.tornado.drivers.common.TornadoBufferProvider.getOrAllocateBufferWithSize(TornadoBufferProvider.java:153)
  at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.mm.OCLMemorySegmentWrapper.allocate(OCLMemorySegmentWrapper.java:184)
  at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.newDeviceBufferAllocation(OCLTornadoDevice.java:577)
  at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.allocate(OCLTornadoDevice.java:590)
  at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.allocateObjects(OCLTornadoDevice.java:567)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.executeAlloc(TornadoVMInterpreter.java:414)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.execute(TornadoVMInterpreter.java:279)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.execute(TornadoVMInterpreter.java:855)
  at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
  at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.TornadoVM.executeInterpreterSingleThreaded(TornadoVM.java:125)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:112)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.scheduleInner(TornadoTaskGraph.java:859)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.execute(TornadoTaskGraph.java:1366)
  at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.execute(TornadoTaskGraph.java:1378)
  at tornado.api@1.0.4-dev/uk.ac.manchester.tornado.api.TaskGraph.execute(TaskGraph.java:777)
  at tornado.api@1.0.4-dev/uk.ac.manchester.tornado.api.ImmutableTaskGraph.execute(ImmutableTaskGraph.java:49)
  at tornado.api@1.0.4-dev/uk.ac.manchester.tornado.api.TornadoExecutionPlan$TornadoExecutor.lambda$execute$0(TornadoExecutionPlan.java:406)
  at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
  at tornado.api@1.0.4-dev/uk.ac.manchester.tornado.api.TornadoExecutionPlan$TornadoExecutor.execute(TornadoExecutionPlan.java:406)
  at tornado.api@1.0.4-dev/uk.ac.manchester.tornado.api.TornadoExecutionPlan.execute(TornadoExecutionPlan.java:117)
  at tornado.unittests@1.0.4-dev/uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans.compute(TestMultiThreadedExecutionPlans.java:158)
  at tornado.unittests@1.0.4-dev/uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans.lambda$test04$7(TestMultiThreadedExecutionPlans.java:192)
  at java.base/java.lang.Thread.run(Thread.java:1583)

We need to add in the tornado-test script as parameters of the test the following flag: -Dtornado.device.memory=4GB.

If I run:

tornado -ea  --jvm "-Xmx6g -Dtornado.device.memory=4GB -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans"
WARNING: Using incubator modules: jdk.incubator.vector
Running thread t0Running thread t1Test: class uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans
  Running test: test01                     ................  [PASS] 
  Running test: test02                     ................  [PASS] 
  Running test: test03                     ................  [PASS] 
  Running test: test04                     ................  [PASS]

Yes, This is expected. For this test we need to increase the device heap size. It is already controlled in the tornado-test script

jjfumero commented 2 months ago

I should mention, this PR is connected to #389.

jjfumero commented 2 months ago

All comments addressed