beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.18k stars 113 forks source link

[hotfix] Resolving issue with missing CUDA NVML library for OpenCL #396

Closed stratika closed 5 months ago

stratika commented 5 months ago

Description

This is a hotfix to resolve the problem that occurs if the OpenCL NVIDIA driver is installed but CUDA is not installed in the default paths. In this case the JNI functions that query the NVML functions are not working properly and an exception is thrown:

Exception in thread "main" java.lang.UnsatisfiedLinkError: 'long uk.ac.manchester.tornado.drivers.opencl.power.OCLNvidiaPowerMetric.clNvmlInit()'
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.power.OCLNvidiaPowerMetric.clNvmlInit(Native Method)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.power.OCLNvidiaPowerMetric.initializePowerLibrary(OCLNvidiaPowerMetric.java:48)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.power.OCLNvidiaPowerMetric.<init>(OCLNvidiaPowerMetric.java:36)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLDeviceContext.<init>(OCLDeviceContext.java:76)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLContext.createDeviceContext(OCLContext.java:209)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLContext.createDeviceContext(OCLContext.java:42)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.graal.OCLHotSpotBackendFactory.createJITCompiler(OCLHotSpotBackendFactory.java:95)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLBackendImpl.createOCLJITCompiler(OCLBackendImpl.java:204)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLBackendImpl.installDevices(OCLBackendImpl.java:218)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLBackendImpl.lambda$discoverDevices$4(OCLBackendImpl.java:225)
    at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
    at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:617)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLBackendImpl.discoverDevices(OCLBackendImpl.java:223)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLBackendImpl.<init>(OCLBackendImpl.java:76)
    at tornado.drivers.opencl@1.0.4-dev/uk.ac.manchester.tornado.drivers.opencl.OCLTornadoDriverProvider.createBackend(OCLTornadoDriverProvider.java:48)
    at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.TornadoCoreRuntime.loadBackends(TornadoCoreRuntime.java:167)
    at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.TornadoCoreRuntime.<init>(TornadoCoreRuntime.java:105)
    at tornado.runtime@1.0.4-dev/uk.ac.manchester.tornado.runtime.TornadoCoreRuntime.<clinit>(TornadoCoreRuntime.java:79)
    at tornado.drivers.common@1.0.4-dev/uk.ac.manchester.tornado.drivers.TornadoDeviceQuery.main(TornadoDeviceQuery.java:74)

Problem description

If the patch provides a fix for a bug, please describe what was the issue and how to reproduce the issue.

Backend/s tested

Mark the backends affected by this PR.

OS tested

Mark the OS where this PR is tested.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

How to test the new patch?

make BACKEND=opencl
tornado --enableProfiler console -m tornado.examples/uk.ac.manchester.tornado.examples.VectorAddInt --params="100000"

make BACKEND=ptx
tornado --enableProfiler console -m tornado.examples/uk.ac.manchester.tornado.examples.VectorAddInt --params="100000"

jjfumero commented 5 months ago

I confirm this patch works in the cluster.