Syncleus / aparapi

The New Official Aparapi: a framework for executing native Java and Scala code on the GPU.
http://aparapi.com
Apache License 2.0
466 stars 59 forks source link

The JVM crashes irregularly. #143

Closed Xianguang-Zhou closed 5 years ago

Xianguang-Zhou commented 5 years ago

I am writing a neural network library with Aparapi, but when I test the code, the JVM crashes irregularly.

Here is the error message:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f90dcf4e343, pid=5495, tid=0x00007f9136b78700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [pipe_r600.so+0xf3343]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/xxx/light_neural_network/java/hs_err_pid5495.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Here is the test code:

import static org.zxg.ai.lnn.tensor.Array.a;

import org.zxg.ai.lnn.autograd.Variable;
import org.zxg.ai.lnn.tensor.Tensor;

public class DotGradTest {

    void println(Object o) {
        System.out.println(o);
    }

    void testTensor(Tensor t1, Tensor t2) {
        Variable v1 = new Variable(t1);
        Variable v2 = new Variable(t2);
        Variable y = v1.dot(v2);
        y.backward();
        println(v1.gradient());
        println("-----");
        println(v2.gradient());
        println("-----------------------");
    }

    void testShape(int[] s1, int[] s2) {
        Tensor t1 = new Tensor(s1);
        t1.arange(1, t1.size() + 1);
        Tensor t2 = new Tensor(s2);
        t2.arange(1, t2.size() + 1);
        testTensor(t1, t2);
    }

    void test1() {
        testShape(a(9), a(9));
    }

    void test2() {
        testShape(a(2, 9), a(9, 2));
    }

    void test3() {
        testShape(a(2, 3, 9), a(9, 2, 3));
    }

    void test4() {
        testShape(a(2, 3, 4, 9), a(9, 2, 4, 3));
    }

    void test5() {
        testShape(a(2, 4, 3, 5, 9), a(9, 5, 2, 4, 3));
    }

    void test6() {
        testShape(a(3, 5, 2, 6, 2, 8), a(8, 6, 4, 2, 3, 2));
    }

    void run() {
        test1();
        test2();
        test3();
        test4();
        test5();
        test6();
    }

    public static void main(String[] args) {
        new DotGradTest().run();
    }
}
pep-pig commented 5 years ago

I have met the same problem, do you have any idea to solve this problem yet?

CoreRasurae commented 5 years ago

@Xianguang-Zhou Are you running your code from inside an IDE, or from command line? Can you try to recompile aparapi, aparapi-native, aparapi-jni from sources. At least try recompiling aparapi. The code you are showing does not contain the aparapi Kernel, so I have no idea if there is a problem in it.

Xianguang-Zhou commented 5 years ago

@vonlippmann, I am trying to use JOCL instead of Aparapi.

Xianguang-Zhou commented 5 years ago

Hello @CoreRasurae , the Aparapi kernels are here: https://github.com/Xianguang-Zhou/light_neural_network/tree/master/java/src/main/java/org/zxg/ai/lnn/tensor The test code is used for testing my neural network library. Aparapi is used for accelerating tensor computations in my neural network library. My neural network library is here: https://github.com/Xianguang-Zhou/light_neural_network/tree/master/java

CoreRasurae commented 5 years ago

Hi @Xianguang-Zhou, I am unable to reproduce the instability you're experiencing, I've run your test code 15 times one after the other, and it always ran until the end. I've ran from the command line, like this:

java -cp light-neural-network-0.0.1-SNAPSHOT.jar:aparapi-jni-1.4.1.jar:aparapi-1.10.0.jar:bcel-6.3.jar:. DotGradTest

Tested with NVIDIA proprietary driver on Linux with a GTX 1050Ti and openjdk 11.

Please check if it is a problem with your AMD driver, can you try with a different driver, or GPU?

pep-pig commented 5 years ago

@CoreRasurae, I only installed cuda10. Do I still need install Opencl sdk?it seems opencl sdk for nvidia is included in cuda

CoreRasurae commented 5 years ago

@vonlippmann For opencl on windows, it normally suffices to install the nvidia driver itself. You can look for clinfo command line utility which can be downloaded from the internet. You can also try gpu-z (https://www.techpowerup.com/gpuz/). Those utilities can check if OpenCL is working or not.

Xianguang-Zhou commented 5 years ago

@CoreRasurae , thank you for your help. I have replaced Aparapi with JOCL, and the JVM does not crash.

grfrost commented 4 years ago

I know you have closed this, but the reason for Aparapi not working is

Tensor t1 = new Tensor(s1);

Aparapi cannot allocate in OpenCL.