deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.07k stars 648 forks source link

ai.djl.engine.EngineException Unknown type name '__torch__.torch.classes.neuron.Model' #1178

Closed yingmuting closed 3 years ago

yingmuting commented 3 years ago

Description

(A clear and concise description of what the bug is.) eks create a cluster on inf1.2xlarge, build djl app to ecr image, and create a deployment yaml,a pod has two container, djl app container and neuron-rtd whic use image:790709498068.dkr.ecr.us-west-2.amazonaws.com/neuron-rtd:1.0.6905.0, after eks deploy, the app start success, but when access the djl inference api in a controller(named InferencePointController), get exception: ai.djl.engine.EngineException Unknown type name 'torch.torch.classes.neuron.Model'

Do i need to add 'torch_neuron/lib/libneuron_op.so' to djl app ?

Error Message

(Paste the complete error message, including stack trace.) ai.djl.engine.EngineException Unknown type name 'torch.torch.classes.neuron.Model'

Environment Info

djl 0.12.0, resnet50 on pytorch, i have compile the model to neuron model

frankfliu commented 3 years ago

@yingmuting Did you set environment variable PYTORCH_EXTRA_LIBRARY_PATH?

export PYTORCH_EXTRA_LIBRARY_PATH=$(python -m site | grep $VIRTUAL_ENV | awk -F"'" '{print $2}')/torch_neuron/lib/libneuron_op.so

See: https://github.com/deepjavalibrary/djl-demo/tree/master/aws/inferentia

If you won't like environment, you can set system property: PYTORCH_EXTRA_LIBRARY_PATH, but you need make sure this is set before PyTorch engine is loaded.

yingmuting commented 3 years ago

@yingmuting Did you set environment variable PYTORCH_EXTRA_LIBRARY_PATH?

export PYTORCH_EXTRA_LIBRARY_PATH=$(python -m site | grep $VIRTUAL_ENV | awk -F"'" '{print $2}')/torch_neuron/lib/libneuron_op.so

See: https://github.com/deepjavalibrary/djl-demo/tree/master/aws/inferentia

If you won't like environment, you can set system property: PYTORCH_EXTRA_LIBRARY_PATH, but you need make sure this is set before PyTorch engine is loaded.

thanks,when set ENV PYTORCH_EXTRA_LIBRARY_PATH in dockerfile,now is ok,my step is: 1.on inf1 :pip install torchvision==0.9.1 torch-neuron==1.8.1.1.4.1.0 'neuron-cc[tensorflow]==1.4.1.0' --extra-index-url=https://pip.repos.neuron.amazonaws.com 2.download the libneuron_op.so ,then copy the file in dockerfile,set ENV PYTORCH_EXTRA_LIBRARY_PATH,build image