beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.18k stars 113 forks source link

Support lazy copy-out for batch processing #376

Closed mairooni closed 5 months ago

mairooni commented 5 months ago

Description

This PR provides support for lazy copy-out in batch processing. The fix is related to the issue #333.

Problem description

When the mode of the copy-out was set to UNDER_DEMAND on a TaskGraph that uses batches, the transferToHost copied only the first batch of the output data. In this PR, the stream-out is invoked with the appropriate offsets for each batch, writing all the data to the host buffer.

Backend/s tested

Mark the backends affected by this PR.

OS tested

Mark the OS where this PR is tested.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

How to test the new patch?

Run the following unittests: tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#test100MBSmallLazy tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#test100MBLazy tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#test300MBLazy tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#test512MBLazy tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#testBatchNotEven2Lazy

jjfumero commented 5 months ago

I get segfaults with the PTX backend. Can you reproduce it?

tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches
tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.batches.TestBatches"
WARNING: Using incubator modules: jdk.incubator.vector
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb630f8d2c0, pid=14114, tid=14115
#
# JRE version: Java(TM) SE Runtime Environment (21.0.2+13) (build 21.0.2+13-LTS-58)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0.2+13-LTS-58, mixed mode, tiered, jvmci, parallel gc, linux-amd64)
# Problematic frame:
# C  [libcuda.so.1+0x18d2c0]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/juan/tornadovm/TornadoVM/core.14114)
#
# An error report file with more information is saved as:
# /home/juan/tornadovm/TornadoVM/hs_err_pid14114.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
jjfumero commented 5 months ago

This is the one breaking:

tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#test300MBLazy
jjfumero commented 5 months ago

OpenCL and SPIR-V are passing.

mairooni commented 5 months ago

There was a bug in the PTXMemorySegmentWrapper class, I just pushed a fix. Please check again.

jjfumero commented 5 months ago

Thanks. It works now