Closed CensorKo closed 2 years ago
Another question, It seems that libneuron_op.so was deleted in torch-neuron==1.8.1 due to the neuron-rtd was removed. So how to use DJL with aws-neuron-dkms in Neuron Runtime 2.x (libnrt.so)? Did you have samples on it?
Confusing...
@CensorKao a few thing you need to check:
We are working 0.14.0 to make DJL work with 1.16.0 neuron sdk. If you want, you can try our 0.14.0-SNAPSHOT version. Documentation is still WIP.
@CensorKao a few thing you need to check:
- The example currently only work with DJL 0.12.0 with torch-neuron 1.8.1
- You have to use pytorch precxx11 version: https://github.com/deepjavalibrary/djl-demo/blob/master/aws/inferentia/build.gradle#L21
- You have to install neuron sdk <= 1.15 and use old neuron runtime.
We are working 0.14.0 to make DJL work with 1.16.0 neuron sdk. If you want, you can try our 0.14.0-SNAPSHOT version. Documentation is still WIP.
Thanks, but how to check neuron sdk version? I have checked all the document only get neuron-rtd version: 1.5.0.0 Should i guess neuron-rtd 1.5.0.0 equal to neuron sdk 1.15? https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/v1/nrt_start.html
@frankfliu
Our model was trained by yolov5 and already exported to torchscript.pt files. Now we want to run it on inferentia chips instance. So how to trace our yolov5 torchscript.pt model by using trace.py or did we need to trace it before running?
@CensorKao I just created a demo for Huggingface model: https://github.com/deepjavalibrary/djl-demo/pull/184
@frankfliu
Our model was trained by yolov5 and already exported to torchscript.pt files. Now we want to run it on inferentia chips instance. So how to trace our yolov5 torchscript.pt model by using trace.py or did we need to trace it before running?
You have to trace it use neuron-cc, regular torchscript won't work with inferentia.
@zachgk @frankfliu @lanking520 @stu1130 @roywei
We deploy yolov5 torchscript model on aws inferentia instance. But DJL can't load libneuron_op.so file on startup.
First, libneuron_op.so exist in OS And PYTORCH_EXTRA_LIBRARY_PATH environment variable is set.
Caused by: java.lang.UnsatisfiedLinkError: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/lib/libneuron_op.so: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/lib/libneuron_op.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817) at java.lang.Runtime.load0(Runtime.java:810) at java.lang.System.load(System.java:1088) at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:72) ... 44 more