buddy-compiler / buddy-benchmark

Benchmark Framework for Buddy Projects
Apache License 2.0
44 stars 33 forks source link

Llama benchmark #112

Open gxsoar opened 7 months ago

gxsoar commented 7 months ago

Llama Benchmark

Use PyTorch with TorchDynamo to perform vicuna end-to-end inference.

Environments

Run on Ubuntu 22.04.1 LTS CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz GPU: NVIDIA GeForce RTX 3090 CUDA:CUDA Version: 12.0 python:python3.9 pytorch:2.0.0+cu118 Anaconda:Miniconda3

Benchmark Time

CPU time per round of inference: pytorch average time per round of inference: 982.4393878173828 ms pytorch with torchdynamo average time per round of inference:977.5693103027344 ms GPU time per round of inference: pytorch average time per round of inference: 25.33698874791463ms pytorch with torchdynamo average time per round of inference:19.13074951807658ms

notion-workspace[bot] commented 7 months ago

Pre-Task: Initial LLaMA Benchmark