Llama Benchmark

Use PyTorch with TorchDynamo to perform vicuna end-to-end inference.

Environments

Run on Ubuntu 22.04.1 LTS CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz GPU: NVIDIA GeForce RTX 3090 CUDA：CUDA Version: 12.0 python：python3.9 pytorch：2.0.0+cu118 Anaconda：Miniconda3

Benchmark Time

CPU time per round of inference: pytorch average time per round of inference: 982.4393878173828 ms pytorch with torchdynamo average time per round of inference:977.5693103027344 ms GPU time per round of inference: pytorch average time per round of inference: 25.33698874791463ms pytorch with torchdynamo average time per round of inference:19.13074951807658ms

buddy-compiler / buddy-benchmark

Llama benchmark #112

Llama Benchmark

Environments

Benchmark Time