Example code producing memory access fault for array size 1024*8, works for 1023*8

chopikus commented 1 month ago

Describe the bug

Running the example program produces an error for big enough arrays.

Program:

public class App 
{
    public static void parallelInitialization(VectorFloat8 data) {
        for (@Parallel int i = 0; i < data.size(); i++) {
            int j = i * 8;
            data.set(i, new Float8(j, j + 1, j + 2, j + 3, j + 4 , j + 5 , j + 6, j + 7));
        }
    }

    public static void computeSquare(VectorFloat8 data) {
        for (@Parallel int i = 0; i < data.size(); i++) {
            Float8 item = data.get(i);
            Float8 result = Float8.mult(item, item);
            data.set(i, result);
        }
    }

    public static void main( String[] args ) {
        VectorFloat8 array = new VectorFloat8(1024 * 8);
        TaskGraph taskGraph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.EVERY_EXECUTION, array)
                .task("t0", App::parallelInitialization, array)
                .task("t1", App::computeSquare, array)
                .transferToHost(DataTransferMode.EVERY_EXECUTION, array);

        TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(taskGraph.snapshot());

        // Obtain a device from the list
        TornadoDevice device = TornadoExecutionPlan.getDevice(0, 0);
        executionPlan.withDevice(device);

        // Put in a loop to analyze hotspots with Intel VTune (as a demo)
        for (int i = 0; i < 1000; i++ ) {
            // Execute the application
            executionPlan.execute();
        }
    }
}

Running mvn package and tornado -jar [JARFILE] produces an error:

Memory access fault by GPU node-1 (Agent handle: 0x7fe80076f230) on address 0x7fe614c00000. Reason: Page not present or supervisor privilege.

However if I change the size of array to 1023*8 instead of 1024*8 the error is gone.

How To Reproduce

I put my code into a repository: https://github.com/chopikus/my-tornado-app.

Steps:

git clone https://github.com/chopikus/my-tornado-app.git
cd my-tornado-app
./run.sh

Expected behavior

No errors should be produced

Computing system setup (please complete the following information):

Fedora 40
ROCm runtime version 1.13
Radeon 680M GPU on Ryzen 7 PRO 6850U
tornado --version: version=1.0.7-dev, branch=master, commit=96b3040; Backends installed: opencl
tornado -version: java version "21.0.4" 2024-07-16 LTS; Java(TM) SE Runtime Environment (build 21.0.4+8-LTS-274); Java HotSpot(TM) 64-Bit Server VM (build 21.0.4+8-LTS-274, mixed mode)

Additional context

tornado --devices:

WARNING: Using incubator modules: jdk.incubator.vector

Number of Tornado drivers: 1
Driver: OpenCL
  Total number of OpenCL devices  : 1
  Tornado device=0:0  (DEFAULT)
    OPENCL --  [AMD Accelerated Parallel Processing] -- gfx1035
        Global Memory Size: 4.0 GB
        Local Memory Size: 64.0 KB
        Workgroup Dimensions: 3
        Total Number of Block Threads: [256]
        Max WorkGroup Configuration: [1024, 1024, 1024]
        Device OpenCL C version: OpenCL C 2.0

jjfumero commented 1 month ago

Hi @chopikus . Thank you for the detailed report.

I can reproduce the error also for NVIDIA GPUs. The problem is in the clBuildProgram, once the code is generated. However, running on Intel GPUs with OpenCL works. It also works for the SPIR-V and PTX backends. We will take a look and analyze why NVIDIA and AMD are reporting a clBuildProgram failure.

jjfumero commented 1 month ago

For reference, I added this test in this branch: https://github.com/beehive-lab/TornadoVM/commit/87e6080ad9c2e23ecb3b19479275b51c0ae61738

jjfumero commented 1 month ago

In general the most tested platforms for TornadoVM are NVIDIA discrete GPUs, and Intel GPUs (ARC and HD Graphics). The most supported backend is OpenCL. However, we are pushing for the SPIR-V. We hope in the future to be this the default one.

Regarding the FPGAs, we have tested on Intel Altera FPGAs and AMD/Xilinx FPGAs. My colleague Thanos can give you more details about the current support.

On Fri, 26 Jul 2024 at 15:21, Igor Chovpan @.***> wrote:

Thank you for a quick response!

@jjfumero https://github.com/jjfumero Which platform is the most stable for TornadoVM? Does this example work on FPGAs or on some hosting?

I really like this project and want to try it out more.

— Reply to this email directly, view it on GitHub https://github.com/beehive-lab/TornadoVM/issues/510#issuecomment-2252754083, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKX2BLX5Z4QXKGKNJBB7S3ZOJEMZAVCNFSM6AAAAABLPOQVTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSG42TIMBYGM . You are receiving this because you were mentioned.Message ID: @.***>

andrii0lomakin commented 1 month ago

Good day.

I hope you excuse me if I add my 5 cents by asking.

Recently, AMD released a new NPU, which will be supported by Xilinx RT and, in turn, will work over OpenCL. If I got it correctly, one API supports OpenCL too, so will not make the SPIR-V default approach narrow down the usage possibilities of TornadoVM?

jjfumero commented 1 month ago

Hi @andrii0lomakin , OpenCL >= 2.1 can dispatch SPIR-V binary kernels. In fact, TornadoVM currently dispatches SPIR-V with both, OpenCL runtime and Level Zero API from oneAPI. We hope vendors in the future use more SPIR-V. From my view, the way to go is SPIR-V and PTX. However, debugging the compiler gets increasingly complex.

As it is now the vendors/accelerators landscape, it is difficult to deprecate our OpenCL C backend. Just a few examples: FPGA vendors support OpenCL 1.0 - 1.2. Apple supports OpenCL 1.2. Thus, if TornadoVM wants to run also on those platforms, the OpenCL C is still needed. Unless, of course, there are new backends (e.g., for VHDL directly, Apple Metal, etc).

beehive-lab / TornadoVM