deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.07k stars 648 forks source link

Provide a means to override the auto-detected CUDA version #1229

Closed jimfcarroll closed 2 years ago

jimfcarroll commented 2 years ago

Description

PyTorch for CUDA 11.1 runs fine on 11.2. However, there is no way (from what I could tell by looking at the code) to override the autodetection of the cuda version inside the pytorch engine. There should be a way to do this. Either a system property or explicitly providing the CUDA build version of pytorch you want.

I can add this enhancement to the CUDA query code if you would accept a PR.

frankfliu commented 2 years ago

@jimfcarroll Thanks for offering the help. This is definitely a useful enhancement. Please go ahead raise a PR.

frankfliu commented 2 years ago

I created a PR can partially workaround this issue: https://github.com/deepjavalibrary/djl/pull/1233

The pytorch-native-auto still won't be able download cuda 11.1 package, but if you use pytorch-native-cu111:1.9.0:linux-x86_64 package explicitly , it should be able to run on cu113 machine.

frankfliu commented 2 years ago

With DJL 0.15.0, pytorch can be loaded with any minor version of CUDA now.

We also removed pytorch-native-auto, now DJL can auto detect compatible version of PyTorch automatically.

jimfcarroll commented 2 years ago

Sorry I never got back to this. At the time I didn't realize pytorch models are not compatible with libtorch models. I wrote a bridge from java to python in order to run pytorch which is pretty raw but working and if anyone is interested it's here: https://github.com/KognitionAI/pilecv4j; the native-python sub project

frankfliu commented 2 years ago

@jimfcarroll

I most cases, you can load pytorch models with libtorch. In DJL we have Python engine allows you run python code in DJL. I'm very interested in your native-python sub project. Are you open to have a chat? You can find me on our slack channel if want.