Closed jestiny0 closed 1 year ago
Hi @jestiny0,
We are aware of this issue and a fix will be issued in an upcoming release.
In the meantime, could you try passing -Xss2m
to the JVM as a workaround and see if it fixes the issue? If you are unsure how to customize the DJL behavior, I would suggest seeing their docs or filing a ticket here: https://github.com/deepjavalibrary/djl.
@aws-stdun
It worked when I passed -Xss2m
to the JVM! Thanks a lot for your helpful idea.
By the way, I'd like to know when will you release the fixed version?
It should be shipped in our next release, although I can't give a specific date for that. Glad you got it working.
I met some errors when I deploy my application to a docker container and run inference(load model successfully)
enviroment:
main steps in dockerfile (mofied with reference: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/Dockerfile-libmode.html#libmode-dockerfile):
run with docker command:
--device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3
and I can see neuron devices in my container:I have configured enviroment variables so that DJL(serving framework) can load
libtorchneuron.so
successfully, and it can be seen the neuron-traced model was loaded successfully, but failed when warmup( make some test cases running inference)I use gdb to debug the core dump file and get:
Key points
I have loaded my model and running test cases successfully with python code, and I can guarantee that my dummy test requests are the same (both shape and dtypes) between java and python code.
So I'd like to know if you have ideas about this error? Thanks.