Use PyTorch with TorchDynamo to perform vicuna end-to-end inference.
Environments
Run on Ubuntu 22.04.1 LTS
CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
GPU: NVIDIA GeForce RTX 3090
CUDA:CUDA Version: 12.0
python:python3.9
pytorch:2.0.0+cu118
Anaconda:Miniconda3
Benchmark Time
CPU time per round of inference:
pytorch average time per round of inference: 982.4393878173828 ms
pytorch with torchdynamo average time per round of inference:977.5693103027344 ms
GPU time per round of inference:
pytorch average time per round of inference: 25.33698874791463ms
pytorch with torchdynamo average time per round of inference:19.13074951807658ms
Llama Benchmark
Use PyTorch with TorchDynamo to perform vicuna end-to-end inference.
Environments
Run on Ubuntu 22.04.1 LTS CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz GPU: NVIDIA GeForce RTX 3090 CUDA:CUDA Version: 12.0 python:python3.9 pytorch:2.0.0+cu118 Anaconda:Miniconda3
Benchmark Time
CPU time per round of inference: pytorch average time per round of inference: 982.4393878173828 ms pytorch with torchdynamo average time per round of inference:977.5693103027344 ms GPU time per round of inference: pytorch average time per round of inference: 25.33698874791463ms pytorch with torchdynamo average time per round of inference:19.13074951807658ms