Open anijain2305 opened 1 year ago
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 90%, 55/61 | 100%, 46/46 | 98%, 60/61 |
| aot_eager | 87%, 53/61 | 100%, 46/46 | 98%, 60/61 |
| inductor | 84%, 51/61 | 100%, 46/46 | 97%, 59/61 |
| inductor_no_cudagraphs | 85%, 52/61 | 100%, 46/46 | 97%, 59/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| inductor | 1.41x | 1.34x | 1.35x |
| inductor_no_cudagraphs | 1.32x | 1.33x | 1.34x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 4.02 | 2.53 | 1.79 |
| aot_eager | 2.89 | 4.62 | 3.86 |
| inductor | 7.37 | 14.30 | 12.08 |
| inductor_no_cudagraphs | 7.15 | 12.33 | 11.95 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.05x | 1.03x | 1.18x |
| aot_eager | 1.05x | 1.03x | 1.16x |
| inductor | 1.05x | 1.25x | 1.12x |
| inductor_no_cudagraphs | 1.11x | 1.31x | 1.18x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 95%, 57/60 | 100%, 45/45 | 100%, 59/59 |
| aot_eager | 92%, 55/60 | 100%, 45/45 | 100%, 59/59 |
| inductor | 90%, 54/60 | 100%, 45/45 | 95%, 56/59 |
| inductor_no_cudagraphs | 92%, 55/60 | 100%, 45/45 | 95%, 56/59 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.01x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| inductor | 1.50x | 1.48x | 1.41x |
| inductor_no_cudagraphs | 1.37x | 1.37x | 1.38x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 4.07 | 2.98 | 1.84 |
| aot_eager | 3.11 | 5.66 | 4.07 |
| inductor | 8.70 | 16.24 | 12.75 |
| inductor_no_cudagraphs | 8.06 | 14.24 | 12.53 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.03x | 1.02x | 1.16x |
| aot_eager | 1.02x | 1.02x | 1.12x |
| inductor | 0.98x | 1.14x | 1.06x |
| inductor_no_cudagraphs | 1.05x | 1.22x | 1.13x |
+------------------------+------------+-------------+-------------+
Testing the inference numbers