Open Jimmy-Newtron opened 5 months ago
$ sudo dpkg -i djl-bench_0.26.0-1_all.deb
....
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
$ nvidia-smi
Thu Jan 25 15:01:50 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2070 ... On | 00000000:01:00.0 On | N/A |
| N/A 58C P8 8W / 90W | 46MiB / 8192MiB | 3% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3732 G /usr/lib/xorg/Xorg 45MiB |
+---------------------------------------------------------------------------------------+
$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This seems like it might be an issue with PyTorch tracing itself (it seems similar to this issue https://github.com/pytorch/pytorch/issues/114035).
Also, can you confirm the model you are attempting to use?
elastic/multilingual-e5-small-optimized
sentence-camembert-base
Both of those models, as well as the one mentioned in the PyTorch issue, are variants of BERT. There might be an issue with tracing those models.
I will try to reproduce the issue once you confirm which model you are facing issues with.
@siddvenk you have spotted it right, I have been testing multiple models variants of BERT to compare them.
Here the list of models failing:
All of them are failing
Solution: Use PyTorch 2.0.1 like this
PYTORCH_VERSION=2.0.1 djl-bench -e PyTorch -w 10 -c 1000 -s "(32,32)l,(32,32)l" -g 1 -p /home/ubuntu/models/model/nlp/text_embedding/ai/djl/huggingface/pytorch/elastic/multilingual-e5-small-optimized/0.0.1/multilingual-e5-small-optimized.zip
Unfortunately, this seems like an issue with PyTorch 2.1.x. That's the default version of PyTorch we use for DJL 0.26.0. See this related PyTorch issue https://github.com/pytorch/pytorch/issues/107503. torchscript is in maintenence mode, so this issue will likely never be fixed moving forward. Until there is support for serializing compiled models so that we can load torch.compile
d models, you might have to stick with PyTorch 2.0.1.
I can reproduce your issue:
(.hfdjlvenv) ubuntu@xxxxxxxx:~$ djl-bench -e PyTorch -w 10 -c 1000 -s "(32,32)l,(32,32)l" -g 1 -p /home/ubuntu/models/model/nlp/text_embedding/ai/djl/huggingface/pytorch/elastic/multilingual-e5-small-optimized/0.0.1/multilingual-e5-small-optimized.zip
[INFO ] - DJL will collect telemetry to help us better understand our users’ needs, diagnose issues, and deliver additional features. If you would like to learn more or opt-out please go to: https://docs.djl.ai/docs/telemetry.html for more information.
[INFO ] - PyTorch graph executor optimizer is enabled, this may impact your inference latency and throughput. See: https://docs.djl.ai/docs/development/inference_performance_optimization.html#graph-executor-optimization
[INFO ] - Number of inter-op threads is 24
[INFO ] - Number of intra-op threads is 24
[INFO ] - Load PyTorch (2.1.1) in 0.033 ms.
[INFO ] - Running Benchmark on: gpu(0).
Downloading: 100% |████████████████████████████████████████|
Loading: 100% |████████████████████████████████████████|
[INFO ] - Model multilingual-e5-small-optimized loaded in: 5199.294 ms.
[INFO ] - Warmup with 10 iteration ...
[ERROR] - Unexpected error
ai.djl.translate.TranslateException: ai.djl.engine.EngineException: default_program(22): error: extra text after expected end of number
aten_mul[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = v * -3.402823466385289e+38.f;
^
default_program(25): error: extra text after expected end of number
aten_add[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = v_1 / 5.656854152679443f + v_2 * -3.402823466385289e+38.f;
^
2 errors detected in the compilation of "default_program".
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
The good news is that PyTorch 2.0.1 seems to work just fine.
(.hfdjlvenv) ubuntu@xxxxx:~$ PYTORCH_VERSION=2.0.1 djl-bench -e PyTorch -w 10 -c 1000 -s "(32,32)l,(32,32)l" -g 1 -p /home/ubuntu/models/model/nlp/text_embedding/ai/djl/huggingface/pytorch/elastic/multilingual-e5-small-optimized/0.0.1/multilingual-e5-small-optimized.zip
[INFO ] - DJL will collect telemetry to help us better understand our users’ needs, diagnose issues, and deliver additional features. If you would like to learn more or opt-out please go to: https://docs.djl.ai/docs/telemetry.html for more information.
[WARN ] - Override PyTorch version: 2.0.1.
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libc10_cuda.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libcublas-3b81d170.so.11.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libnvfuser_codegen.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libc10.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libnvrtc-builtins-2dc4bf68.so.11.8.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libtorch_cpu.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libcaffe2_nvrtc.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libtorch.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libtorch_cuda_linalg.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libnvrtc-672ee683.so.11.2.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libtorch_cuda.so.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libnvToolsExt-847d78f2.so.1.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libgomp-52f2fd74.so.1.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libcublasLt-b6d14a74.so.11.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch/2.0.1/cu118/linux-x86_64/native/lib/libcudart-d0da41ae.so.11.0.gz ...
[INFO ] - Downloading jni https://publish.djl.ai/pytorch/2.0.1/jnilib/0.26.0/linux-x86_64/cu118/libdjl_torch.so to cache ...
[INFO ] - PyTorch graph executor optimizer is enabled, this may impact your inference latency and throughput. See: https://docs.djl.ai/docs/development/inference_performance_optimization.html#graph-executor-optimization
[INFO ] - Number of inter-op threads is 24
[INFO ] - Number of intra-op threads is 24
[INFO ] - Load PyTorch (2.0.1) in 0.031 ms.
[INFO ] - Running Benchmark on: gpu(0).
Loading: 100% |████████████████████████████████████████|
[INFO ] - Model multilingual-e5-small-optimized loaded in: 531.493 ms.
[INFO ] - Warmup with 10 iteration ...
[INFO ] - Warmup latency, min: 6.199 ms, max: 2030.892 ms
Iteration: 100% |████████████████████████████████████████|
[INFO ] - Inference result: [0.012903332, 0.637843, 0.35279134 ...]
[INFO ] - Throughput: 164.85, completed 1000 iteration in 6066 ms.
[INFO ] - Model loading time: 531.493 ms.
[INFO ] - total P50: 6.031 ms, P90: 6.062 ms, P99: 6.121 ms
[INFO ] - inference P50: 3.668 ms, P90: 3.710 ms, P99: 3.788 ms
[INFO ] - preprocess P50: 0.040 ms, P90: 0.049 ms, P99: 0.069 ms
[INFO ] - postprocess P50: 2.316 ms, P90: 2.340 ms, P99: 2.370 ms
Thanks for the investigation. I see that Torch is working on a 2.2 release and I wonder if they will fix the issue as part of the new release. I hope in the few months to come to see a working DJL version that supports PyTorch engine with Cuda 12.1+
@siddvenk - note this workaround no longer works since DJL as of 0.27.0 no longer support PyTorch 2.0.1: https://github.com/deepjavalibrary/djl/blob/master/engines/pytorch/pytorch-engine/README.md.
Description
I want to run a benchmark of a model on GPU and it fails due an error in the PyTorch Engine
Expected Behavior
Successful benchmark
Error Message
How to Reproduce?
djl-bench -e PyTorch -w 10 -c 1000 -s "(32,32)l,(32,32)l" -g 1 -p ./models/model/nlp/text_embedding/ai/djl/huggingface/pytorch/elastic/multilingual-e5-small-optimized/0.0.1/multilingual-e5-small-optimized
Execution logs