anijain2305 / torchdynamo_dashboard

TorchDynamo Dashboard Infra
1 stars 1 forks source link

Test - Inference Dashboard #2

Open anijain2305 opened 1 year ago

anijain2305 commented 1 year ago

Testing the inference numbers

anijain2305 commented 1 year ago

Inference Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. This is inference run. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 90%, 55/61 | 100%, 46/46 | 98%, 60/61  |
|       aot_eager        | 87%, 53/61 | 100%, 46/46 | 98%, 60/61  |
|        inductor        | 84%, 51/61 | 100%, 46/46 | 97%, 59/61  |
| inductor_no_cudagraphs | 85%, 52/61 | 100%, 46/46 | 97%, 59/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.41x    |    1.34x    |    1.35x    |
| inductor_no_cudagraphs |   1.32x    |    1.33x    |    1.34x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.02    |    2.53     |    1.79     |
|       aot_eager        |    2.89    |    4.62     |    3.86     |
|        inductor        |    7.37    |    14.30    |    12.08    |
| inductor_no_cudagraphs |    7.15    |    12.33    |    11.95    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.05x    |    1.03x    |    1.18x    |
|       aot_eager        |   1.05x    |    1.03x    |    1.16x    |
|        inductor        |   1.05x    |    1.25x    |    1.12x    |
| inductor_no_cudagraphs |   1.11x    |    1.31x    |    1.18x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more Performance speedup ~~~ +-----------------------------------+------+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+--------+-----------+----------+------------------------+ | drq | 1 | 1.0248 | 1.0061 | 3.1668 | 1.2794 | | hf_T5_large | 1 | 1.092 | 0.8994 | 2.4551 | 2.1993 | | soft_actor_critic | 256 | 0.998 | 0.9378 | 2.2791 | 1.177 | | hf_T5_base | 1 | 1.0021 | 0.9981 | 2.1787 | 2.1532 | | dlrm | 1 | 0.9848 | 1.0381 | 1.9782 | 1.1422 | | lennard_jones | 1000 | 0.8456 | 0.8697 | 1.9276 | 0.8307 | | hf_Albert | 16 | 1.001 | 1.0011 | 1.8985 | 1.872 | | hf_T5 | 4 | 1.0009 | 1.0005 | 1.8525 | 1.8416 | | phlippe_densenet | 128 | 1.0064 | 1.1276 | 1.7133 | 1.3969 | | hf_Reformer | 8 | 0.9982 | 1.0029 | 1.7116 | 1.7139 | | hf_GPT2 | 16 | 0.9999 | 0.9996 | 1.711 | 1.7076 | | timm_nfnet | 128 | 0.9992 | 0.9997 | 1.7102 | 1.6956 | | hf_GPT2_large | 1 | 1.0014 | 0.9993 | 1.6171 | 1.5953 | | hf_Longformer | 4 | 0.9999 | 0.9995 | 1.5795 | 1.5745 | | squeezenet1_1 | 256 | 0.9993 | 0.9987 | 1.5639 | 1.5717 | | shufflenet_v2_x1_0 | 128 | 0.997 | 0.9974 | 1.5 | 1.5035 | | densenet121 | 64 | 1.0 | 1.0 | 1.4986 | 1.4819 | | timm_resnest | 256 | 0.9986 | 1.0002 | 1.484 | 1.4853 | | resnet50 | 64 | 0.9989 | 0.9988 | 1.459 | 1.4515 | | pytorch_unet | 4 | 0.9996 | 0.9998 | 1.4324 | 1.4317 | | fastNLP_Bert | 16 | 0.9974 | 0.9947 | 1.4274 | 1.4248 | | pytorch_CycleGAN_and_pix2pix | 1 | 0.9939 | 0.9955 | 1.4214 | 1.3167 | | resnet152 | 64 | 0.9995 | 0.9994 | 1.4182 | 1.4057 | | mobilenet_v2 | 128 | 0.9994 | 0.999 | 1.4158 | 1.4159 | | resnext50_32x4d | 64 | 0.9992 | 0.9991 | 1.4127 | 1.4 | | timm_regnet | 32 | 0.9991 | 0.9972 | 1.4116 | 1.3319 | | Background_Matting | 1 | 0.9985 | 0.9983 | 1.3682 | 1.3576 | | mobilenet_v3_large | 128 | 0.9987 | 0.9993 | 1.3506 | 1.3526 | | functorch_dp_cifar10 | 512 | 0.9903 | 0.9807 | 1.3401 | 1.2551 | | BERT_pytorch | 32 | 0.9956 | 0.9881 | 1.3312 | 1.3096 | | mnasnet1_0 | 128 | 0.9988 | 0.9994 | 1.3236 | 1.3244 | | phlippe_resnet | 256 | 0.9931 | 0.9932 | 1.3221 | 1.2526 | | speech_transformer | 1 | 0.9847 | 0.8724 | 1.3211 | 1.3313 | | hf_Bert_large | 4 | 1.002 | 0.9983 | 1.3169 | 1.2937 | | timm_efficientnet | 128 | 0.9989 | 0.9993 | 1.3158 | 1.3137 | | hf_Bert | 8 | 1.0025 | 0.9989 | 1.3142 | 1.2965 | | doctr_det_predictor | 4 | 1.0016 | 0.9909 | 1.3026 | 1.3014 | | LearningToPaint | 256 | 0.9973 | 0.9983 | 1.2932 | 1.3189 | | yolov3 | 8 | 0.9977 | 0.9981 | 1.2837 | 1.2649 | | hf_DistilBert | 16 | 1.0001 | 0.9994 | 1.2765 | 1.2712 | | resnet18 | 256 | 0.9994 | 0.9991 | 1.2672 | 1.2763 | | timm_vovnet | 128 | 0.9993 | 0.9996 | 1.2624 | 1.2581 | | vgg16 | 8 | 0.9958 | 0.9947 | 1.1891 | 1.1677 | | Super_SloMo | 8 | 0.9998 | 0.9995 | 1.182 | 1.175 | | alexnet | 1024 | 0.9994 | 0.9993 | 1.1534 | 1.1938 | | vision_maskrcnn | 4 | 0.9426 | 0.9268 | 1.1437 | 1.1797 | | timm_vision_transformer | 128 | 0.9974 | 0.998 | 1.1168 | 1.1111 | | attention_is_all_you_need_pytorch | 256 | 0.9986 | 0.9973 | 1.1138 | 1.0949 | | pytorch_stargan | 16 | 0.9969 | 0.9961 | 1.1131 | 1.1136 | | hf_Bart | 8 | 1.0029 | 0.9958 | 1.1012 | 1.1011 | | dcgan | 1024 | 0.9981 | 0.998 | 1.0577 | 1.0596 | | timm_vision_transformer_large | 8 | 1.0005 | 1.0008 | 1.0494 | 1.0384 | | demucs | 32 | 1.0001 | 0.9996 | 0.9998 | 0.9997 | | doctr_reco_predictor | 64 | 0.9938 | 0.9975 | 0.9946 | 0.9935 | | tts_angular | 512 | 0.9963 | 0.9954 | 0.9924 | 0.9982 | | nvidia_deeprecommender | 512 | 0.9963 | 0.9946 | 0.8845 | 0.9918 | | hf_BigBird | 4 | 0.9967 | 0.993 | 0.0 | 1.2566 | | tacotron2 | 128 | 1.0966 | 0.0 | 0.0 | 0.0 | | moco | 64 | 0.9958 | 0.0 | 0.0 | 0.0 | | DALLE2_pytorch | 0 | 0.0 | 0.0 | 0.0 | 0.0 | | detectron2_fcos_r_50_fpn | 0 | 0.0 | 0.0 | 0.0 | 0.0 | +-----------------------------------+------+--------+-----------+----------+------------------------+ ~~~ Accuracy ~~~ +-----------------------------------+-----+------------------+------------------+------------------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+-----+------------------+------------------+------------------+------------------------+ | hf_T5_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | timm_vision_transformer_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | mobilenet_v3_large | 4 | pass | pass | pass | pass | | phlippe_densenet | 4 | pass | pass | pass | pass | | pytorch_CycleGAN_and_pix2pix | 1 | pass | pass | pass | pass | | pytorch_stargan | 16 | pass | pass | pass | pass | | pytorch_unet | 2 | pass | pass | pass | pass | | resnet152 | 4 | pass | pass | pass | pass | | resnet18 | 4 | pass | pass | pass | pass | | resnet50 | 4 | pass | pass | pass | pass | | resnext50_32x4d | 4 | pass | pass | pass | pass | | shufflenet_v2_x1_0 | 4 | pass | pass | pass | pass | | soft_actor_critic | 256 | pass | pass | pass | pass | | speech_transformer | 4 | pass | pass | pass | pass | | squeezenet1_1 | 4 | pass | pass | pass | pass | | timm_efficientdet | 4 | pass | pass | pass | pass | | timm_efficientnet | 4 | pass | pass | pass | pass | | timm_nfnet | 4 | pass | pass | pass | pass | | timm_regnet | 4 | pass | pass | pass | pass | | timm_resnest | 4 | pass | pass | pass | pass | | timm_vision_transformer | 4 | pass | pass | pass | pass | | timm_vovnet | 4 | pass | pass | pass | pass | | tts_angular | 4 | pass | pass | pass | pass | | vgg16 | 4 | pass | pass | pass | pass | | yolov3 | 4 | pass | pass | pass | pass | | nvidia_deeprecommender | 4 | pass | pass | pass | pass | | phlippe_resnet | 4 | pass | pass | pass | pass | | mobilenet_v2 | 4 | pass | pass | pass | pass | | fastNLP_Bert | 4 | pass | pass | pass | pass | | BERT_pytorch | 4 | pass | pass | pass | pass | | Background_Matting | 1 | pass | pass | pass | pass | | LearningToPaint | 4 | pass | pass | pass | pass | | Super_SloMo | 4 | pass | pass | pass | pass | | alexnet | 4 | pass | pass | pass | pass | | attention_is_all_you_need_pytorch | 4 | pass | pass | pass | pass | | dcgan | 4 | pass | pass | pass | pass | | densenet121 | 4 | pass | pass | pass | pass | | mnasnet1_0 | 4 | pass | pass | pass | pass | | doctr_reco_predictor | 4 | pass | pass | pass | pass | | drq | 1 | pass | pass | pass | pass | | dlrm | 4 | pass | pass | pass | pass | | functorch_dp_cifar10 | 4 | pass | pass | pass | pass | | hf_GPT2 | 2 | pass | pass | pass | pass | | lennard_jones | 4 | pass | pass | pass | pass | | hf_Albert | 4 | pass | pass | pass | pass | | hf_Reformer | 4 | pass | pass | pass | pass | | hf_Longformer | 4 | pass | pass | pass | pass | | hf_T5 | 4 | pass | pass | pass | pass | | hf_DistilBert | 4 | pass | pass | pass | pass | | hf_Bert_large | 4 | pass | pass | pass | pass | | hf_Bert | 4 | pass | pass | pass | pass | | hf_Bart | 4 | pass | pass | pass | pass | | hf_BigBird | 4 | pass | pass | fail_to_run | pass | | moco | 4 | pass | fail_to_run | fail_to_run | fail_to_run | | DALLE2_pytorch | 4 | fail_to_run | fail_to_run | fail_to_run | fail_to_run | | vision_maskrcnn | 4 | pass | pass | 0.0000 | 0.0000 | | tacotron2 | 4 | pass | fail_to_run | 0.0000 | 0.0000 | | demucs | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | | detectron2_fcos_r_50_fpn | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | | doctr_det_predictor | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +-----------------------------------+-----+------------------+------------------+------------------+------------------------+ ~~~ Compilation latency (sec) ~~~ +-----------------------------------+------+----------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+----------+-----------+----------+------------------------+ | vision_maskrcnn | 4 | 8.241 | 15.7649 | 66.0949 | 42.4555 | | hf_Longformer | 4 | 4.8308 | 9.1081 | 33.8975 | 32.2896 | | hf_T5_large | 1 | 11.2894 | 17.4488 | 30.9691 | 30.4491 | | yolov3 | 8 | 1.6802 | 3.6443 | 17.8457 | 17.3166 | | hf_GPT2_large | 1 | 4.7435 | 8.421 | 17.3706 | 17.0395 | | attention_is_all_you_need_pytorch | 256 | 1.2609 | 2.7682 | 17.1504 | 15.7586 | | timm_vision_transformer_large | 8 | 2.6453 | 6.1895 | 16.314 | 15.8107 | | speech_transformer | 1 | 1.6536 | 3.5834 | 16.2521 | 15.0244 | | hf_T5_base | 1 | 5.0749 | 8.1023 | 16.0494 | 15.3537 | | densenet121 | 64 | 2.1336 | 5.2594 | 15.2494 | 15.0479 | | hf_Reformer | 8 | 1.4987 | 2.6034 | 12.8986 | 12.2529 | | hf_Bart | 8 | 1.7111 | 3.3115 | 12.6101 | 12.5953 | | resnet152 | 64 | 2.4461 | 6.2734 | 12.5083 | 12.0015 | | hf_T5 | 4 | 2.4876 | 4.0756 | 12.1149 | 11.1377 | | hf_Bert_large | 4 | 3.0993 | 5.7654 | 10.8633 | 10.6527 | | Super_SloMo | 8 | 1.204 | 2.925 | 10.0844 | 9.8696 | | timm_nfnet | 128 | 2.1054 | 3.8291 | 8.8616 | 8.7343 | | fastNLP_Bert | 16 | 1.5065 | 2.9006 | 8.2846 | 7.6391 | | hf_GPT2 | 16 | 1.55 | 2.7929 | 7.7324 | 7.5908 | | BERT_pytorch | 32 | 1.4507 | 2.9121 | 7.4947 | 7.2631 | | timm_regnet | 32 | 1.9834 | 3.722 | 7.1953 | 7.0248 | | doctr_det_predictor | 4 | 1.261 | 3.0938 | 7.0458 | 6.6775 | | timm_efficientnet | 128 | 1.5313 | 3.036 | 6.6266 | 6.4773 | | timm_vision_transformer | 128 | 0.8927 | 1.9043 | 6.3691 | 6.0912 | | timm_resnest | 256 | 0.6306 | 1.2882 | 6.2985 | 6.0496 | | shufflenet_v2_x1_0 | 128 | 1.0066 | 2.4716 | 5.9515 | 5.7353 | | hf_Bert | 8 | 1.5695 | 2.895 | 5.8942 | 5.8154 | | phlippe_densenet | 128 | 0.8806 | 2.1915 | 5.7873 | 5.5692 | | hf_Albert | 16 | 1.4134 | 2.7094 | 5.7543 | 5.524 | | mobilenet_v3_large | 128 | 0.9113 | 2.2511 | 5.6266 | 5.5884 | | mobilenet_v2 | 128 | 0.8741 | 2.2077 | 5.2335 | 5.2215 | | resnext50_32x4d | 64 | 0.9003 | 2.2032 | 5.2287 | 4.8345 | | timm_vovnet | 128 | 1.1426 | 2.1075 | 5.1691 | 4.9489 | | Background_Matting | 1 | 0.9713 | 2.2791 | 5.1691 | 4.9209 | | resnet50 | 64 | 0.9459 | 2.3204 | 5.0908 | 4.8862 | | hf_DistilBert | 16 | 0.7992 | 1.4186 | 5.0349 | 4.7274 | | mnasnet1_0 | 128 | 0.837 | 2.0486 | 4.7904 | 4.5988 | | functorch_dp_cifar10 | 512 | 0.3088 | 0.531 | 3.3274 | 3.2167 | | pytorch_CycleGAN_and_pix2pix | 1 | 0.4308 | 1.0066 | 3.2657 | 3.0655 | | pytorch_unet | 4 | 0.4827 | 1.1158 | 3.1365 | 2.8746 | | pytorch_stargan | 16 | 0.4283 | 1.1197 | 2.813 | 2.7817 | | resnet18 | 256 | 0.4141 | 0.9234 | 2.6977 | 2.4564 | | LearningToPaint | 256 | 0.4363 | 0.9683 | 2.6331 | 2.4596 | | phlippe_resnet | 256 | 0.3997 | 0.9013 | 2.3902 | 2.2379 | | squeezenet1_1 | 256 | 0.2371 | 0.4091 | 1.9137 | 1.7443 | | vgg16 | 8 | 0.1809 | 0.302 | 1.456 | 1.4129 | | alexnet | 1024 | 0.1525 | 0.2304 | 1.4328 | 1.2835 | | drq | 1 | 0.2935 | 0.385 | 1.3473 | 1.2152 | | dlrm | 1 | 0.2508 | 0.3632 | 1.2426 | 1.0564 | | dcgan | 1024 | 0.1624 | 0.261 | 1.2185 | 1.2193 | | nvidia_deeprecommender | 512 | 0.1823 | 0.2667 | 1.1612 | 1.0288 | | soft_actor_critic | 256 | 0.2076 | 0.2584 | 1.0431 | 0.9338 | | lennard_jones | 1000 | 0.1298 | 0.1847 | 1.0128 | 0.8478 | | tts_angular | 512 | 0.1648 | 0.195 | 0.9414 | 0.8122 | | doctr_reco_predictor | 64 | 0.7994 | 0.7884 | 0.6222 | 0.6206 | | demucs | 32 | 0.2781 | 0.2782 | 0.1943 | 0.1864 | | hf_BigBird | 4 | 3.4026 | 4.7473 | nan | 11.2452 | | tacotron2 | 128 | 119.4557 | nan | nan | nan | | moco | 64 | 22.7028 | nan | nan | nan | | DALLE2_pytorch | 0 | nan | nan | nan | nan | | detectron2_fcos_r_50_fpn | 0 | nan | nan | nan | nan | +-----------------------------------+------+----------+-----------+----------+------------------------+ ~~~ Peak Memory Compression Ratio ~~~ +-----------------------------------+------+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+--------+-----------+----------+------------------------+ | timm_efficientnet | 128 | 1.2148 | 1.2148 | 1.6531 | 1.792 | | timm_vovnet | 128 | 1.2926 | 1.2926 | 1.5959 | 1.6554 | | pytorch_unet | 4 | 1.5928 | 1.3548 | 1.5789 | 1.5928 | | hf_T5 | 4 | 1.0782 | 1.0782 | 1.5282 | 1.5313 | | mobilenet_v2 | 128 | 1.0723 | 1.0723 | 1.4645 | 1.5894 | | timm_nfnet | 128 | 1.1588 | 1.6471 | 1.457 | 1.5071 | | mobilenet_v3_large | 128 | 1.1002 | 1.1002 | 1.4117 | 1.5726 | | hf_Bart | 8 | 1.004 | 1.0037 | 1.3553 | 1.3554 | | mnasnet1_0 | 128 | 1.1266 | 1.1266 | 1.3292 | 1.5086 | | resnet18 | 256 | 1.0 | 1.0 | 1.286 | 1.4285 | | attention_is_all_you_need_pytorch | 256 | 1.0312 | 1.0292 | 1.2286 | 1.2344 | | Background_Matting | 1 | 1.311 | 1.311 | 1.2134 | 1.2299 | | yolov3 | 8 | 1.2265 | 1.2264 | 1.2049 | 1.2265 | | squeezenet1_1 | 256 | 1.0 | 1.0 | 1.1629 | 1.299 | | hf_T5_base | 1 | 1.0276 | 1.0276 | 1.1572 | 1.1587 | | demucs | 32 | 1.1385 | 1.1385 | 1.1385 | 1.1385 | | phlippe_resnet | 256 | 1.1717 | 1.1717 | 1.1008 | 1.1717 | | shufflenet_v2_x1_0 | 128 | 1.0 | 1.0 | 1.0768 | 1.3133 | | phlippe_densenet | 128 | 1.2259 | 1.2259 | 1.0508 | 1.0796 | | pytorch_stargan | 16 | 1.0494 | 1.0494 | 1.0494 | 1.0494 | | doctr_det_predictor | 4 | 0.4934 | 0.4934 | 1.021 | 0.4912 | | hf_Albert | 16 | 1.0232 | 1.0216 | 1.0192 | 1.0232 | | hf_DistilBert | 16 | 1.016 | 1.0154 | 1.0133 | 1.016 | | resnext50_32x4d | 64 | 1.0 | 0.9484 | 1.0086 | 1.0564 | | resnet50 | 64 | 1.056 | 0.9486 | 1.0085 | 1.0561 | | resnet152 | 64 | 1.0428 | 0.9597 | 1.0066 | 1.0429 | | hf_Bert | 8 | 1.0088 | 1.0082 | 1.0058 | 1.0088 | | hf_Bert_large | 4 | 1.0033 | 1.003 | 1.0016 | 1.0033 | | hf_GPT2 | 16 | 1.0 | 0.9995 | 0.9993 | 1.0 | | dlrm | 1 | 1.0 | 1.0 | 0.999 | 1.0 | | nvidia_deeprecommender | 512 | 1.001 | 1.001 | 0.999 | 1.1422 | | doctr_reco_predictor | 64 | 0.997 | 0.997 | 0.997 | 0.997 | | hf_T5_large | 1 | 1.0024 | 1.0024 | 0.9958 | 0.9964 | | pytorch_CycleGAN_and_pix2pix | 1 | 1.0 | 0.9997 | 0.9947 | 1.0 | | hf_GPT2_large | 1 | 1.0 | 0.9999 | 0.9945 | 0.9956 | | vgg16 | 8 | 1.0 | 1.0 | 0.9937 | 1.0 | | timm_vision_transformer_large | 8 | 1.0039 | 1.0037 | 0.9839 | 0.9846 | | timm_regnet | 32 | 1.0 | 1.0 | 0.9829 | 0.9998 | | dcgan | 1024 | 1.0 | 1.0 | 0.9783 | 1.0 | | hf_Longformer | 4 | 0.5916 | 0.5953 | 0.9664 | 0.9892 | | fastNLP_Bert | 16 | 1.0619 | 1.0608 | 0.9541 | 0.9573 | | tts_angular | 512 | 0.9982 | 0.9982 | 0.9537 | 0.9982 | | functorch_dp_cifar10 | 512 | 1.0 | 1.0 | 0.9463 | 1.0 | | timm_resnest | 256 | 1.0 | 0.8998 | 0.9102 | 0.9472 | | LearningToPaint | 256 | 1.0 | 1.0 | 0.8734 | 1.0 | | alexnet | 1024 | 1.0 | 0.9167 | 0.8714 | 1.0 | | Super_SloMo | 8 | 1.0841 | 0.9258 | 0.845 | 0.8639 | | drq | 1 | 0.9627 | 0.9627 | 0.8437 | 0.9627 | | BERT_pytorch | 32 | 1.0265 | 1.0265 | 0.8058 | 0.8087 | | soft_actor_critic | 256 | 1.0 | 1.0 | 0.79 | 1.0 | | hf_Reformer | 8 | 1.384 | 1.5123 | 0.7044 | 0.7573 | | timm_vision_transformer | 128 | 1.1056 | 1.0986 | 0.6961 | 0.7486 | | speech_transformer | 1 | 1.0655 | 1.0651 | 0.6679 | 0.6703 | | densenet121 | 64 | 1.1503 | 1.0007 | 0.5977 | 0.6177 | | vision_maskrcnn | 4 | 0.7923 | 0.7922 | 0.5905 | 0.795 | | lennard_jones | 1000 | 1.0 | 1.0 | 0.5622 | 1.0 | | hf_BigBird | 4 | 0.8781 | 0.878 | nan | 0.8781 | | moco | 64 | 1.0357 | nan | nan | nan | | tacotron2 | 128 | 0.7663 | nan | nan | nan | | DALLE2_pytorch | 0 | nan | nan | nan | nan | | detectron2_fcos_r_50_fpn | 0 | nan | nan | nan | nan | +-----------------------------------+------+--------+-----------+----------+------------------------+ ~~~ Absolute latency (ms) ~~~ +-----------------------------------+------+----------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+----------+-----------+----------+------------------------+ | vision_maskrcnn | 4 | 165.3751 | 168.3843 | 138.1865 | 133.836 | | hf_Longformer | 4 | 177.7305 | 177.6922 | 112.3879 | 112.8816 | | demucs | 32 | 96.4082 | 96.436 | 96.3731 | 96.4266 | | hf_GPT2 | 16 | 140.7779 | 140.7904 | 82.2663 | 82.4248 | | timm_vision_transformer_large | 8 | 63.8748 | 63.8446 | 60.8244 | 61.6358 | | hf_T5 | 4 | 107.7393 | 107.7478 | 58.1867 | 58.5388 | | Super_SloMo | 8 | 63.8387 | 63.8149 | 54.0027 | 54.3421 | | pytorch_unet | 4 | 67.9805 | 67.9503 | 47.3894 | 47.4318 | | hf_T5_base | 1 | 95.661 | 96.0823 | 43.8782 | 44.5879 | | timm_resnest | 256 | 61.8914 | 61.8598 | 41.6678 | 41.623 | | timm_nfnet | 128 | 70.4865 | 68.7478 | 40.1819 | 40.555 | | fastNLP_Bert | 16 | 52.2602 | 52.6789 | 36.704 | 36.6037 | | doctr_det_predictor | 4 | 48.2132 | 49.2009 | 36.6718 | 36.6766 | | resnet152 | 64 | 50.7557 | 50.8251 | 35.82 | 36.1333 | | timm_vision_transformer | 128 | 33.9717 | 33.9482 | 30.3213 | 30.4728 | | timm_vovnet | 128 | 37.8464 | 37.8799 | 30.0141 | 30.0896 | | hf_GPT2_large | 1 | 46.6565 | 46.6913 | 28.8728 | 29.3264 | | hf_Albert | 16 | 50.5565 | 50.5794 | 26.6975 | 27.0312 | | alexnet | 1024 | 29.1828 | 29.1832 | 25.3136 | 24.4332 | | hf_Bart | 8 | 26.2046 | 26.0725 | 23.8817 | 23.8524 | | hf_Reformer | 8 | 39.5501 | 39.3432 | 23.0629 | 23.0115 | | hf_Bert_large | 4 | 28.8009 | 28.8379 | 21.8645 | 22.314 | | resnext50_32x4d | 64 | 29.7848 | 29.8986 | 21.1063 | 21.359 | | attention_is_all_you_need_pytorch | 256 | 23.4359 | 23.4836 | 21.0349 | 21.3839 | | timm_efficientnet | 128 | 27.3529 | 27.4112 | 20.7859 | 20.8285 | | resnet18 | 256 | 25.7838 | 25.7921 | 20.3283 | 20.1841 | | timm_regnet | 32 | 27.7896 | 27.7935 | 19.7673 | 20.8208 | | densenet121 | 64 | 29.2286 | 29.2807 | 19.5376 | 19.7158 | | hf_T5_large | 1 | 53.5439 | 65.0104 | 19.3227 | 21.4085 | | hf_DistilBert | 16 | 23.0383 | 23.0379 | 18.0298 | 18.1205 | | Background_Matting | 1 | 22.485 | 22.5089 | 16.4111 | 16.5229 | | hf_Bert | 8 | 20.5463 | 20.6005 | 15.6761 | 15.9638 | | resnet50 | 64 | 22.5742 | 22.6059 | 15.4677 | 15.5654 | | mnasnet1_0 | 128 | 20.0301 | 19.9967 | 15.1213 | 15.0924 | | mobilenet_v2 | 128 | 19.2942 | 19.333 | 13.6182 | 13.6357 | | dcgan | 1024 | 13.7557 | 13.7382 | 13.001 | 12.9693 | | squeezenet1_1 | 256 | 19.5143 | 19.5144 | 12.482 | 12.4382 | | mobilenet_v3_large | 128 | 15.9305 | 15.9604 | 11.7933 | 11.7814 | | yolov3 | 8 | 14.7084 | 15.0619 | 11.4584 | 11.6107 | | speech_transformer | 1 | 14.921 | 16.5492 | 11.0369 | 10.7523 | | BERT_pytorch | 32 | 14.5162 | 14.5846 | 10.81 | 11.0379 | | tts_angular | 512 | 8.8181 | 8.8531 | 8.9145 | 8.9159 | | LearningToPaint | 256 | 9.4494 | 9.5336 | 7.2565 | 7.1171 | | doctr_reco_predictor | 64 | 7.6247 | 7.3903 | 7.0454 | 6.9874 | | pytorch_stargan | 16 | 7.0188 | 7.0007 | 6.2586 | 6.2707 | | shufflenet_v2_x1_0 | 128 | 7.8553 | 7.8608 | 5.2269 | 5.1969 | | nvidia_deeprecommender | 512 | 4.6047 | 3.9878 | 5.1839 | 4.6375 | | vgg16 | 8 | 4.9927 | 4.9944 | 4.1875 | 4.2564 | | phlippe_densenet | 128 | 5.3834 | 4.7674 | 3.3165 | 4.0308 | | pytorch_CycleGAN_and_pix2pix | 1 | 3.9529 | 3.7929 | 3.0924 | 3.3416 | | functorch_dp_cifar10 | 512 | 3.0496 | 2.6854 | 2.2523 | 2.4017 | | phlippe_resnet | 256 | 2.527 | 2.5101 | 1.885 | 2.0002 | | dlrm | 1 | 0.8499 | 0.6733 | 0.3609 | 0.616 | | drq | 1 | 0.6735 | 0.6316 | 0.2202 | 0.6486 | | soft_actor_critic | 256 | 0.2723 | 0.3037 | 0.1285 | 0.243 | | lennard_jones | 1000 | 0.2419 | 0.2356 | 0.115 | 0.3122 | | hf_BigBird | 4 | 187.143 | 187.632 | nan | 148.2089 | | tacotron2 | 128 | 584.2111 | nan | nan | nan | | moco | 64 | 47.8611 | nan | nan | nan | | DALLE2_pytorch | 0 | nan | nan | nan | nan | | detectron2_fcos_r_50_fpn | 0 | nan | nan | nan | nan | +-----------------------------------+------+----------+-----------+----------+------------------------+ ~~~

huggingface suite with float32 precision

see more Performance speedup ~~~ +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | MT5ForConditionalGeneration | 16 | 1.073 | 0.9662 | 2.4505 | 1.9327 | | XLNetLMHeadModel | 8 | 1.0019 | 1.0013 | 2.3162 | 2.3123 | | MobileBertForMaskedLM | 64 | 1.0858 | 0.93 | 1.8978 | 1.5189 | | GPT2ForSequenceClassification | 4 | 1.0001 | 0.9997 | 1.877 | 1.8589 | | T5Small | 4 | 1.0048 | 0.9943 | 1.7975 | 1.7636 | | T5ForConditionalGeneration | 4 | 1.0063 | 0.9958 | 1.7959 | 1.7666 | | DistillGPT2 | 16 | 0.9994 | 0.9993 | 1.7122 | 1.704 | | ElectraForCausalLM | 32 | 1.0012 | 0.9993 | 1.6914 | 1.6705 | | AllenaiLongformerBase | 4 | 0.9999 | 0.9902 | 1.5825 | 1.551 | | GoogleFnet | 16 | 0.9994 | 0.9993 | 1.5738 | 1.7554 | | ElectraForQuestionAnswering | 64 | 1.0005 | 0.9999 | 1.5086 | 1.4934 | | XGLMForCausalLM | 8 | 1.0393 | 0.9278 | 1.4273 | 1.3205 | | MobileBertForQuestionAnswering | 128 | 1.0042 | 1.0035 | 1.4074 | 1.3641 | | RobertaForCausalLM | 16 | 1.0006 | 0.9994 | 1.4027 | 1.3916 | | LayoutLMForSequenceClassification | 16 | 1.0001 | 0.9992 | 1.3995 | 1.3877 | | YituTechConvBert | 16 | 0.9999 | 0.9997 | 1.3787 | 1.3667 | | LayoutLMForMaskedLM | 16 | 1.0 | 0.9993 | 1.375 | 1.3666 | | RobertaForQuestionAnswering | 16 | 1.0005 | 0.999 | 1.3735 | 1.3622 | | BertForQuestionAnswering | 16 | 1.0003 | 0.9993 | 1.3696 | 1.3588 | | BertForMaskedLM | 16 | 1.0005 | 1.0002 | 1.3514 | 1.3424 | | CamemBert | 16 | 1.0003 | 0.9987 | 1.3513 | 1.3422 | | AlbertForMaskedLM | 4 | 1.0009 | 1.001 | 1.327 | 1.3195 | | AlbertForQuestionAnswering | 4 | 1.0003 | 1.001 | 1.3238 | 1.3195 | | MegatronBertForCausalLM | 4 | 1.0028 | 1.002 | 1.2788 | 1.2596 | | OPTForCausalLM | 2 | 0.9983 | 0.9976 | 1.2731 | 1.2907 | | MegatronBertForQuestionAnswering | 8 | 1.0 | 0.9994 | 1.2677 | 1.2547 | | BlenderbotSmallForCausalLM | 64 | 1.0014 | 0.9955 | 1.2589 | 1.2595 | | DistilBertForMaskedLM | 128 | 1.0 | 0.9995 | 1.2344 | 1.2303 | | Speech2Text2ForCausalLM | 256 | 0.9967 | 0.9906 | 1.2227 | 1.2268 | | BartForCausalLM | 4 | 1.0012 | 0.999 | 1.2184 | 1.2176 | | PLBartForCausalLM | 8 | 1.0002 | 0.9998 | 1.2064 | 1.2054 | | DistilBertForQuestionAnswering | 256 | 0.999 | 0.9992 | 1.2052 | 1.2024 | | MBartForCausalLM | 4 | 1.0014 | 0.9992 | 1.1976 | 1.1975 | | TrOCRForCausalLM | 32 | 0.9993 | 0.999 | 1.1622 | 1.1636 | | BlenderbotSmallForConditionalGeneration | 64 | 1.0014 | 0.9978 | 1.1497 | 1.1473 | | PLBartForConditionalGeneration | 4 | 1.0008 | 0.9957 | 1.1378 | 1.14 | | M2M100ForConditionalGeneration | 16 | 0.997 | 0.9778 | 1.1244 | 1.1031 | | PegasusForCausalLM | 32 | 0.9972 | 0.9945 | 1.1211 | 1.1119 | | BartForConditionalGeneration | 2 | 1.0012 | 0.9983 | 1.1067 | 1.1067 | | MBartForConditionalGeneration | 2 | 1.002 | 0.9984 | 1.0873 | 1.0867 | | DebertaV2ForMaskedLM | 1 | 0.7431 | 0.6574 | 1.0803 | 0.77 | | PegasusForConditionalGeneration | 32 | 0.9971 | 0.9967 | 1.0664 | 1.0566 | | DebertaForMaskedLM | 4 | 0.8421 | 0.753 | 1.0602 | 1.0631 | | BlenderbotForCausalLM | 4 | 1.002 | 0.9952 | 1.0515 | 1.0498 | | DebertaForQuestionAnswering | 8 | 0.9946 | 0.9905 | 1.0361 | 1.1819 | | DebertaV2ForQuestionAnswering | 2 | 0.9947 | 0.9676 | 0.9162 | 1.0231 | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Accuracy ~~~ +-----------------------------------------+----+------------------+------------------+------------------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+----+------------------+------------------+------------------+------------------------+ | BlenderbotForCausalLM | 1 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | DebertaV2ForMaskedLM | 1 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | PLBartForConditionalGeneration | 1 | pass | pass | pass | pass | | MBartForCausalLM | 1 | pass | pass | pass | pass | | MBartForConditionalGeneration | 1 | pass | pass | pass | pass | | MT5ForConditionalGeneration | 1 | pass | pass | pass | pass | | MegatronBertForCausalLM | 1 | pass | pass | pass | pass | | MegatronBertForQuestionAnswering | 1 | pass | pass | pass | pass | | MobileBertForMaskedLM | 1 | pass | pass | pass | pass | | MobileBertForQuestionAnswering | 1 | pass | pass | pass | pass | | OPTForCausalLM | 1 | pass | pass | pass | pass | | PLBartForCausalLM | 1 | pass | pass | pass | pass | | PegasusForCausalLM | 1 | pass | pass | pass | pass | | PegasusForConditionalGeneration | 1 | pass | pass | pass | pass | | RobertaForCausalLM | 1 | pass | pass | pass | pass | | RobertaForQuestionAnswering | 1 | pass | pass | pass | pass | | Speech2Text2ForCausalLM | 1 | pass | pass | pass | pass | | T5ForConditionalGeneration | 1 | pass | pass | pass | pass | | T5Small | 1 | pass | pass | pass | pass | | TrOCRForCausalLM | 1 | pass | pass | pass | pass | | XGLMForCausalLM | 1 | pass | pass | pass | pass | | XLNetLMHeadModel | 1 | pass | pass | pass | pass | | M2M100ForConditionalGeneration | 1 | pass | pass | pass | pass | | LayoutLMForSequenceClassification | 1 | pass | pass | pass | pass | | LayoutLMForMaskedLM | 1 | pass | pass | pass | pass | | GoogleFnet | 1 | pass | pass | pass | pass | | AlbertForMaskedLM | 1 | pass | pass | pass | pass | | AlbertForQuestionAnswering | 1 | pass | pass | pass | pass | | AllenaiLongformerBase | 1 | pass | pass | pass | pass | | BartForCausalLM | 1 | pass | pass | pass | pass | | BartForConditionalGeneration | 1 | pass | pass | pass | pass | | BertForMaskedLM | 1 | pass | pass | pass | pass | | BertForQuestionAnswering | 1 | pass | pass | pass | pass | | BlenderbotSmallForCausalLM | 1 | pass | pass | pass | pass | | BlenderbotSmallForConditionalGeneration | 1 | pass | pass | pass | pass | | CamemBert | 1 | pass | pass | pass | pass | | DebertaForMaskedLM | 1 | pass | pass | pass | pass | | DebertaForQuestionAnswering | 1 | pass | pass | pass | pass | | DebertaV2ForQuestionAnswering | 1 | pass | pass | pass | pass | | DistilBertForMaskedLM | 1 | pass | pass | pass | pass | | DistilBertForQuestionAnswering | 1 | pass | pass | pass | pass | | DistillGPT2 | 1 | pass | pass | pass | pass | | ElectraForCausalLM | 1 | pass | pass | pass | pass | | ElectraForQuestionAnswering | 1 | pass | pass | pass | pass | | GPT2ForSequenceClassification | 1 | pass | pass | pass | pass | | YituTechConvBert | 1 | pass | pass | pass | pass | +-----------------------------------------+----+------------------+------------------+------------------+------------------------+ ~~~ Compilation latency (sec) ~~~ +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | DebertaV2ForQuestionAnswering | 2 | 6.6259 | 10.3101 | 43.0451 | 21.019 | | DebertaV2ForMaskedLM | 1 | 6.6731 | 10.4388 | 41.5626 | 19.9304 | | AllenaiLongformerBase | 4 | 4.6241 | 9.5261 | 39.5228 | 37.0787 | | DebertaForMaskedLM | 4 | 4.0661 | 6.3203 | 30.8606 | 16.1217 | | DebertaForQuestionAnswering | 8 | 4.0516 | 6.4168 | 30.7544 | 15.3368 | | M2M100ForConditionalGeneration | 16 | 3.646 | 6.8526 | 22.2439 | 21.2446 | | PegasusForConditionalGeneration | 32 | 3.2597 | 6.7268 | 21.6447 | 20.7627 | | MBartForConditionalGeneration | 2 | 3.3639 | 6.8885 | 20.1097 | 20.1785 | | XLNetLMHeadModel | 8 | 4.1313 | 8.6555 | 19.7031 | 19.8304 | | BartForConditionalGeneration | 2 | 3.3165 | 6.8814 | 19.3623 | 19.0427 | | XGLMForCausalLM | 8 | 2.6261 | 5.2783 | 18.9194 | 18.4895 | | MobileBertForMaskedLM | 64 | 7.0304 | 12.4774 | 18.3681 | 18.1248 | | MobileBertForQuestionAnswering | 128 | 7.0857 | 12.3998 | 18.2001 | 18.185 | | BlenderbotForCausalLM | 4 | 2.6878 | 5.143 | 18.1617 | 17.9775 | | BlenderbotSmallForConditionalGeneration | 64 | 2.26 | 4.6878 | 14.8718 | 15.1227 | | MT5ForConditionalGeneration | 16 | 3.4989 | 5.8067 | 14.8089 | 14.0701 | | YituTechConvBert | 16 | 2.3358 | 4.6666 | 13.0127 | 12.8534 | | PegasusForCausalLM | 32 | 1.3912 | 2.689 | 13.0048 | 12.4951 | | PLBartForConditionalGeneration | 4 | 1.7426 | 3.4186 | 12.9575 | 13.1115 | | OPTForCausalLM | 2 | 1.417 | 2.5998 | 11.5374 | 11.3977 | | MBartForCausalLM | 4 | 1.2915 | 2.6257 | 11.3537 | 11.3893 | | MegatronBertForCausalLM | 4 | 3.3362 | 5.9982 | 11.2485 | 10.9017 | | T5Small | 4 | 2.5088 | 4.1588 | 10.982 | 10.3458 | | MegatronBertForQuestionAnswering | 8 | 3.2037 | 6.0425 | 10.9613 | 10.8094 | | BartForCausalLM | 4 | 1.3601 | 2.637 | 10.9474 | 10.8065 | | TrOCRForCausalLM | 32 | 1.3512 | 2.604 | 10.8594 | 10.8888 | | T5ForConditionalGeneration | 4 | 2.5392 | 4.1129 | 10.7617 | 10.3277 | | GoogleFnet | 16 | 1.003 | 1.6715 | 10.5064 | 7.0694 | | Speech2Text2ForCausalLM | 256 | 0.8117 | 1.502 | 10.36 | 10.0164 | | BlenderbotSmallForCausalLM | 64 | 0.9359 | 1.8596 | 10.1243 | 10.1521 | | PLBartForCausalLM | 8 | 0.7022 | 1.3863 | 8.801 | 8.7473 | | LayoutLMForSequenceClassification | 16 | 1.643 | 3.0454 | 8.0692 | 7.7074 | | LayoutLMForMaskedLM | 16 | 1.6683 | 3.0666 | 8.0107 | 7.6432 | | GPT2ForSequenceClassification | 4 | 1.5743 | 2.8658 | 7.8992 | 7.5908 | | ElectraForCausalLM | 32 | 1.5976 | 2.9503 | 7.5459 | 7.1048 | | RobertaForCausalLM | 16 | 1.6141 | 3.0263 | 7.0771 | 6.9572 | | DistillGPT2 | 16 | 0.8741 | 1.5713 | 6.9232 | 6.454 | | ElectraForQuestionAnswering | 64 | 1.5801 | 3.0094 | 6.2399 | 6.0255 | | BertForQuestionAnswering | 16 | 1.5959 | 2.9881 | 6.2223 | 5.7377 | | CamemBert | 16 | 1.5999 | 3.0262 | 6.181 | 5.8014 | | BertForMaskedLM | 16 | 1.5897 | 2.9618 | 6.1413 | 5.9362 | | RobertaForQuestionAnswering | 16 | 1.6304 | 2.9387 | 6.0278 | 5.8623 | | AlbertForQuestionAnswering | 4 | 1.4092 | 2.7625 | 5.7335 | 5.5299 | | AlbertForMaskedLM | 4 | 1.419 | 2.6802 | 5.6957 | 5.3283 | | DistilBertForQuestionAnswering | 256 | 0.7839 | 1.4999 | 5.3479 | 4.9536 | | DistilBertForMaskedLM | 128 | 0.7729 | 1.4689 | 5.2032 | 4.919 | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Peak Memory Compression Ratio ~~~ +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | ElectraForCausalLM | 32 | 1.0027 | 1.0022 | 2.5791 | 2.5839 | | DistillGPT2 | 16 | 1.0042 | 1.004 | 2.0062 | 2.009 | | RobertaForCausalLM | 16 | 1.0065 | 1.006 | 1.8216 | 1.8253 | | DistilBertForMaskedLM | 128 | 1.0111 | 1.0107 | 1.7675 | 1.7705 | | MobileBertForQuestionAnswering | 128 | 1.922 | 1.922 | 1.7542 | 1.7762 | | BlenderbotSmallForConditionalGeneration | 64 | 1.0042 | 1.0035 | 1.696 | 1.696 | | OPTForCausalLM | 2 | 1.0033 | 1.0031 | 1.6856 | 1.6856 | | BlenderbotSmallForCausalLM | 64 | 1.0041 | 1.0039 | 1.6824 | 1.6824 | | MobileBertForMaskedLM | 64 | 1.0073 | 1.0073 | 1.6754 | 1.6806 | | PLBartForCausalLM | 8 | 1.0058 | 1.0056 | 1.5986 | 1.5986 | | Speech2Text2ForCausalLM | 256 | 0.8788 | 0.8788 | 1.5339 | 1.3822 | | CamemBert | 16 | 1.0084 | 1.0078 | 1.5294 | 1.5328 | | BertForMaskedLM | 16 | 1.0087 | 1.0081 | 1.5198 | 1.5233 | | YituTechConvBert | 16 | 1.0087 | 1.0081 | 1.5185 | 1.5219 | | LayoutLMForMaskedLM | 16 | 1.0086 | 1.008 | 1.512 | 1.5154 | | MT5ForConditionalGeneration | 16 | 0.9997 | 0.9997 | 1.4498 | 1.4512 | | TrOCRForCausalLM | 32 | 1.0063 | 1.006 | 1.4435 | 1.4435 | | T5ForConditionalGeneration | 4 | 1.0096 | 1.0096 | 1.4288 | 1.4337 | | T5Small | 4 | 1.0096 | 1.0096 | 1.4288 | 1.4337 | | AllenaiLongformerBase | 4 | 0.9913 | 0.9913 | 1.364 | 1.3779 | | PLBartForConditionalGeneration | 4 | 1.0041 | 1.0037 | 1.3604 | 1.3604 | | BartForCausalLM | 4 | 1.0041 | 1.0039 | 1.2512 | 1.2512 | | MBartForCausalLM | 4 | 1.0041 | 1.0039 | 1.2512 | 1.2512 | | PegasusForCausalLM | 32 | 0.9073 | 0.9073 | 1.2273 | 1.1094 | | XGLMForCausalLM | 8 | 0.9702 | 0.9702 | 1.1741 | 1.1398 | | M2M100ForConditionalGeneration | 16 | 0.9363 | 0.9363 | 1.1724 | 1.1058 | | PegasusForConditionalGeneration | 32 | 0.9932 | 0.9932 | 1.1665 | 1.1907 | | MegatronBertForCausalLM | 4 | 1.0025 | 1.0022 | 1.1605 | 1.1622 | | XLNetLMHeadModel | 8 | 1.0039 | 1.0035 | 1.0798 | 1.0798 | | BartForConditionalGeneration | 2 | 1.0021 | 1.0018 | 1.0578 | 1.0578 | | MBartForConditionalGeneration | 2 | 1.0021 | 1.0018 | 1.0532 | 1.0532 | | BlenderbotForCausalLM | 4 | 1.0008 | 1.0009 | 1.0002 | 1.0002 | | AlbertForQuestionAnswering | 4 | 1.0898 | 1.0896 | 0.9849 | 0.9865 | | AlbertForMaskedLM | 4 | 1.0896 | 1.0894 | 0.9841 | 0.9857 | | MegatronBertForQuestionAnswering | 8 | 1.0339 | 1.0334 | 0.9822 | 0.9836 | | GPT2ForSequenceClassification | 4 | 1.0149 | 1.0145 | 0.9679 | 0.9699 | | LayoutLMForSequenceClassification | 16 | 1.0927 | 1.0915 | 0.9642 | 0.9668 | | BertForQuestionAnswering | 16 | 1.0946 | 1.0933 | 0.9632 | 0.966 | | RobertaForQuestionAnswering | 16 | 1.0946 | 1.0933 | 0.9632 | 0.966 | | ElectraForQuestionAnswering | 64 | 1.2343 | 1.224 | 0.9244 | 0.9287 | | DistilBertForQuestionAnswering | 256 | 1.1401 | 1.1378 | 0.889 | 0.8911 | | DebertaV2ForMaskedLM | 1 | 1.0 | 1.0 | 0.6022 | 1.0 | | GoogleFnet | 16 | 1.0 | 1.0 | 0.596 | 1.0 | | DebertaV2ForQuestionAnswering | 2 | 1.0016 | 1.0016 | 0.4218 | 0.9935 | | DebertaForMaskedLM | 4 | 0.96 | 0.96 | 0.1842 | 0.9599 | | DebertaForQuestionAnswering | 8 | 0.937 | 0.937 | 0.0892 | 0.9837 | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Absolute latency (ms) ~~~ +-----------------------------------------+-----+----------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+----------+-----------+----------+------------------------+ | AlbertForMaskedLM | 4 | 126.2986 | 126.324 | 96.3088 | 96.9241 | | AlbertForQuestionAnswering | 4 | 125.4353 | 125.3501 | 96.0152 | 96.0583 | | XLNetLMHeadModel | 8 | 155.7818 | 155.9737 | 67.2505 | 67.2324 | | PegasusForConditionalGeneration | 32 | 54.6872 | 54.7408 | 51.1818 | 51.6328 | | TrOCRForCausalLM | 32 | 55.7681 | 56.6637 | 48.9287 | 48.8672 | | MBartForConditionalGeneration | 2 | 47.8816 | 47.9705 | 44.017 | 44.0139 | | BartForConditionalGeneration | 2 | 47.8024 | 47.9847 | 43.1582 | 43.1273 | | OPTForCausalLM | 2 | 53.648 | 53.3112 | 42.0745 | 41.2446 | | MegatronBertForQuestionAnswering | 8 | 53.3403 | 53.3344 | 42.0692 | 42.4696 | | YituTechConvBert | 16 | 55.6232 | 55.5669 | 40.2893 | 40.6707 | | DistilBertForQuestionAnswering | 256 | 46.5554 | 46.594 | 38.8514 | 38.907 | | PLBartForCausalLM | 8 | 45.0857 | 44.697 | 37.325 | 37.3495 | | MBartForCausalLM | 4 | 43.6099 | 43.7791 | 36.5042 | 36.4795 | | BartForCausalLM | 4 | 43.5584 | 42.0178 | 35.7077 | 35.8188 | | BlenderbotSmallForConditionalGeneration | 64 | 40.2691 | 40.4938 | 35.1526 | 35.1403 | | DistilBertForMaskedLM | 128 | 40.6536 | 40.6795 | 32.9732 | 33.0902 | | PLBartForConditionalGeneration | 4 | 37.1676 | 37.2915 | 32.7308 | 31.9921 | | CamemBert | 16 | 44.0406 | 44.1191 | 32.6231 | 32.8209 | | LayoutLMForMaskedLM | 16 | 44.2901 | 44.3064 | 32.2304 | 32.4122 | | BertForMaskedLM | 16 | 43.466 | 43.5017 | 32.2059 | 32.4179 | | RobertaForCausalLM | 16 | 45.0913 | 45.1708 | 32.1996 | 32.4264 | | DebertaV2ForQuestionAnswering | 2 | 28.1659 | 28.9917 | 30.5518 | 27.4803 | | AllenaiLongformerBase | 4 | 47.8858 | 48.2592 | 30.2533 | 30.8194 | | M2M100ForConditionalGeneration | 16 | 33.1684 | 34.1592 | 29.41 | 30.3396 | | MobileBertForQuestionAnswering | 128 | 38.6966 | 38.8171 | 27.7223 | 28.5703 | | RobertaForQuestionAnswering | 16 | 35.3476 | 35.4089 | 25.7754 | 25.9691 | | BertForQuestionAnswering | 16 | 35.2181 | 35.2513 | 25.7682 | 25.9445 | | LayoutLMForSequenceClassification | 16 | 35.9689 | 35.9852 | 25.7673 | 25.9488 | | ElectraForQuestionAnswering | 64 | 38.4819 | 38.5101 | 25.5223 | 25.7736 | | MegatronBertForCausalLM | 4 | 31.73 | 31.7292 | 24.8236 | 25.2269 | | PegasusForCausalLM | 32 | 27.7385 | 27.642 | 24.5256 | 24.6589 | | BlenderbotForCausalLM | 4 | 24.755 | 24.8309 | 23.6011 | 23.5165 | | GoogleFnet | 16 | 37.0613 | 37.0595 | 23.593 | 21.0983 | | DistillGPT2 | 16 | 40.1614 | 40.1497 | 23.4443 | 23.5533 | | DebertaForQuestionAnswering | 8 | 23.2316 | 23.3065 | 22.3748 | 19.5651 | | MobileBertForMaskedLM | 64 | 35.5562 | 42.1256 | 19.9822 | 24.7356 | | DebertaV2ForMaskedLM | 1 | 28.1455 | 31.2516 | 19.6317 | 27.0823 | | Speech2Text2ForCausalLM | 256 | 24.1515 | 24.2044 | 19.5831 | 19.5111 | | ElectraForCausalLM | 32 | 32.8913 | 32.9534 | 19.4924 | 19.7179 | | T5ForConditionalGeneration | 4 | 34.403 | 34.8905 | 19.2715 | 19.658 | | T5Small | 4 | 34.4086 | 34.9084 | 19.2599 | 19.6315 | | BlenderbotSmallForCausalLM | 64 | 23.0937 | 23.217 | 18.3115 | 18.277 | | GPT2ForSequenceClassification | 4 | 32.3442 | 32.4044 | 17.2526 | 17.4334 | | XGLMForCausalLM | 8 | 23.0339 | 24.8851 | 16.8214 | 18.4984 | | DebertaForMaskedLM | 4 | 19.344 | 21.69 | 15.5261 | 15.3074 | | MT5ForConditionalGeneration | 16 | 27.8984 | 31.2039 | 12.3767 | 15.7708 | +-----------------------------------------+-----+----------+-----------+----------+------------------------+ ~~~

timm_models suite with float32 precision

see more Performance speedup ~~~ +---------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+--------+-----------+----------+------------------------+ | dm_nfnet_f0 | 128 | 1.0004 | 0.9995 | 1.7446 | 1.7122 | | sebotnet33ts_256 | 64 | 0.9995 | 0.9994 | 1.6967 | 1.6855 | | nfnet_l0 | 128 | 0.9994 | 0.9998 | 1.658 | 1.6423 | | volo_d1_224 | 64 | 0.9995 | 0.9932 | 1.5704 | 1.5529 | | eca_botnext26ts_256 | 128 | 0.9994 | 1.0003 | 1.5412 | 1.5275 | | resnest101e | 64 | 0.9987 | 1.0018 | 1.5374 | 1.4997 | | xcit_large_24_p8_224 | 5 | 1.0059 | 1.0041 | 1.5263 | 1.4942 | | gmlp_s16_224 | 128 | 0.9998 | 0.9992 | 1.5239 | 1.5154 | | botnet26t_256 | 128 | 0.9994 | 0.9998 | 1.5164 | 1.5143 | | eca_halonext26ts | 128 | 0.9995 | 0.9999 | 1.5141 | 1.5003 | | poolformer_m36 | 64 | 1.0001 | 1.0002 | 1.4992 | 1.4788 | | coat_lite_mini | 128 | 0.9998 | 0.9993 | 1.4682 | 1.4546 | | res2next50 | 128 | 0.9996 | 0.9995 | 1.4516 | 1.4274 | | res2net50_14w_8s | 128 | 0.9996 | 0.9997 | 1.4468 | 1.4309 | | repvgg_a2 | 128 | 0.9997 | 0.9999 | 1.4285 | 1.4251 | | regnety_002 | 128 | 0.998 | 0.9987 | 1.4251 | 1.3912 | | res2net101_26w_4s | 64 | 0.9994 | 0.9998 | 1.4243 | 1.406 | | tnt_s_patch16_224 | 128 | 1.0002 | 0.9999 | 1.423 | 1.4171 | | mobilenetv2_100 | 128 | 0.998 | 0.9988 | 1.4161 | 1.418 | | cait_m36_384 | 4 | 1.0001 | 1.0085 | 1.4105 | 1.3899 | | jx_nest_base | 32 | 0.9994 | 0.9923 | 1.3962 | 1.3732 | | convnext_base | 64 | 0.9989 | 0.9969 | 1.3945 | 1.3825 | | rexnet_100 | 128 | 0.999 | 0.9989 | 1.3907 | 1.3873 | | ghostnet_100 | 128 | 0.9984 | 0.999 | 1.3852 | 1.3863 | | tinynet_a | 128 | 0.999 | 0.999 | 1.3786 | 1.3702 | | cspdarknet53 | 64 | 0.9999 | 1.0003 | 1.3753 | 1.3632 | | hrnet_w18 | 128 | 0.9994 | 1.0 | 1.3729 | 1.3542 | | convit_base | 64 | 0.9998 | 0.9997 | 1.3698 | 1.3608 | | gernet_l | 128 | 0.9996 | 0.9996 | 1.3612 | 1.3597 | | ese_vovnet19b_dw | 128 | 0.9992 | 0.9996 | 1.3607 | 1.36 | | dla102 | 128 | 0.9996 | 0.9994 | 1.3535 | 1.3491 | | mobilenetv3_large_100 | 128 | 0.9982 | 0.9985 | 1.341 | 1.3411 | | gmixer_24_224 | 128 | 0.9993 | 0.9999 | 1.3364 | 1.3316 | | spnasnet_100 | 128 | 0.9985 | 0.9989 | 1.3357 | 1.3355 | | tf_efficientnet_b0 | 128 | 0.9996 | 0.9993 | 1.3354 | 1.3342 | | mnasnet_100 | 128 | 0.9986 | 0.9989 | 1.3177 | 1.3178 | | dpn107 | 32 | 0.9998 | 0.9995 | 1.3153 | 1.2975 | | tf_mixnet_l | 128 | 0.9999 | 0.9999 | 1.3119 | 1.3004 | | resmlp_12_224 | 128 | 0.9999 | 0.9991 | 1.3075 | 1.3092 | | fbnetc_100 | 128 | 0.9991 | 0.9994 | 1.3058 | 1.3038 | | mobilevit_s | 64 | 0.9997 | 0.9997 | 1.302 | 1.2906 | | adv_inception_v3 | 128 | 0.9996 | 0.9999 | 1.2934 | 1.2884 | | gluon_inception_v3 | 128 | 0.9998 | 0.9998 | 1.2927 | 1.2886 | | inception_v3 | 128 | 0.9997 | 0.9998 | 1.2904 | 1.288 | | fbnetv3_b | 128 | 0.999 | 0.9992 | 1.2884 | 1.2847 | | pnasnet5large | 16 | 1.0008 | 1.001 | 1.2848 | 1.2771 | | lcnet_050 | 128 | 0.9938 | 0.9945 | 1.2759 | 1.293 | | selecsls42b | 128 | 0.9988 | 0.999 | 1.2748 | 1.2726 | | mixnet_l | 128 | 0.9997 | 0.9994 | 1.2746 | 1.2637 | | crossvit_9_240 | 128 | 1.0002 | 1.0 | 1.2723 | 1.2578 | | swsl_resnext101_32x16d | 32 | 0.9992 | 1.001 | 1.2598 | 1.2344 | | convmixer_768_32 | 32 | 0.9988 | 0.9989 | 1.239 | 1.2383 | | gluon_xception65 | 32 | 0.9998 | 0.9999 | 1.188 | 1.1858 | | pit_b_224 | 64 | 0.9995 | 0.9995 | 1.1538 | 1.1471 | | mixer_b16_224 | 128 | 1.0021 | 1.0001 | 1.1531 | 1.1529 | | twins_pcpvt_base | 64 | 0.9998 | 1.0006 | 1.1448 | 1.1246 | | beit_base_patch16_224 | 64 | 0.9997 | 0.9999 | 1.1172 | 1.1113 | | swin_base_patch4_window7_224 | 64 | 0.9999 | 0.9997 | 1.1123 | 1.1083 | | deit_base_distilled_patch16_224 | 64 | 0.9995 | 0.999 | 1.0932 | 1.0911 | | vit_base_patch16_224 | 64 | 1.0002 | 0.9993 | 1.0883 | 1.0835 | | visformer_small | 128 | 0.9989 | 0.9998 | 1.0722 | 1.0635 | +---------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Accuracy ~~~ +---------------------------------+----+--------+-----------+---------------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+----+--------+-----------+---------------+------------------------+ | adv_inception_v3 | 8 | pass | pass | pass | pass | | beit_base_patch16_224 | 8 | pass | pass | pass | pass | | mobilevit_s | 8 | pass | pass | pass | pass | | nfnet_l0 | 8 | pass | pass | pass | pass | | pit_b_224 | 8 | pass | pass | pass | pass | | pnasnet5large | 8 | pass | pass | pass | pass | | poolformer_m36 | 8 | pass | pass | pass | pass | | regnety_002 | 8 | pass | pass | pass | pass | | repvgg_a2 | 8 | pass | pass | pass | pass | | res2net101_26w_4s | 8 | pass | pass | pass | pass | | res2net50_14w_8s | 8 | pass | pass | pass | pass | | res2next50 | 8 | pass | pass | pass | pass | | resmlp_12_224 | 8 | pass | pass | pass | pass | | resnest101e | 8 | pass | pass | pass | pass | | rexnet_100 | 8 | pass | pass | pass | pass | | sebotnet33ts_256 | 8 | pass | pass | pass | pass | | selecsls42b | 8 | pass | pass | pass | pass | | spnasnet_100 | 8 | pass | pass | pass | pass | | swin_base_patch4_window7_224 | 8 | pass | pass | pass | pass | | swsl_resnext101_32x16d | 8 | pass | pass | pass | pass | | tf_efficientnet_b0 | 8 | pass | pass | pass | pass | | tf_mixnet_l | 8 | pass | pass | pass | pass | | tinynet_a | 8 | pass | pass | pass | pass | | tnt_s_patch16_224 | 8 | pass | pass | pass | pass | | twins_pcpvt_base | 8 | pass | pass | pass | pass | | visformer_small | 8 | pass | pass | pass | pass | | vit_base_patch16_224 | 8 | pass | pass | pass | pass | | volo_d1_224 | 8 | pass | pass | pass | pass | | xcit_large_24_p8_224 | 8 | pass | pass | pass | pass | | mobilenetv3_large_100 | 8 | pass | pass | pass | pass | | mobilenetv2_100 | 8 | pass | pass | pass | pass | | mnasnet_100 | 8 | pass | pass | pass | pass | | mixnet_l | 8 | pass | pass | pass | pass | | botnet26t_256 | 8 | pass | pass | pass | pass | | coat_lite_mini | 8 | pass | pass | pass | pass | | convit_base | 8 | pass | pass | pass | pass | | convmixer_768_32 | 8 | pass | pass | pass | pass | | convnext_base | 8 | pass | pass | pass | pass | | crossvit_9_240 | 8 | pass | pass | pass | pass | | cspdarknet53 | 8 | pass | pass | pass | pass | | deit_base_distilled_patch16_224 | 8 | pass | pass | pass | pass | | dla102 | 8 | pass | pass | pass | pass | | dm_nfnet_f0 | 8 | pass | pass | pass | pass | | dpn107 | 8 | pass | pass | pass | pass | | eca_botnext26ts_256 | 8 | pass | pass | pass | pass | | eca_halonext26ts | 8 | pass | pass | pass | pass | | ese_vovnet19b_dw | 8 | pass | pass | pass | pass | | fbnetc_100 | 8 | pass | pass | pass | pass | | fbnetv3_b | 8 | pass | pass | pass | pass | | gernet_l | 8 | pass | pass | pass | pass | | gluon_inception_v3 | 8 | pass | pass | pass | pass | | gluon_xception65 | 8 | pass | pass | pass | pass | | gmixer_24_224 | 8 | pass | pass | pass | pass | | gmlp_s16_224 | 8 | pass | pass | pass | pass | | hrnet_w18 | 8 | pass | pass | pass | pass | | inception_v3 | 8 | pass | pass | pass | pass | | jx_nest_base | 8 | pass | pass | pass | pass | | lcnet_050 | 8 | pass | pass | pass | pass | | mixer_b16_224 | 8 | pass | pass | pass | pass | | ghostnet_100 | 8 | pass | pass | fail_accuracy | fail_accuracy | | cait_m36_384 | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +---------------------------------+----+--------+-----------+---------------+------------------------+ ~~~ Compilation latency (sec) ~~~ +---------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+--------+-----------+----------+------------------------+ | mobilevit_s | 64 | 1.6139 | 3.2333 | 49.7143 | 49.8655 | | twins_pcpvt_base | 64 | 2.5291 | 5.7023 | 38.8259 | 38.7766 | | coat_lite_mini | 128 | 1.0834 | 2.2498 | 29.6093 | 29.7204 | | hrnet_w18 | 128 | 5.9686 | 14.2881 | 27.7506 | 27.2722 | | swin_base_patch4_window7_224 | 64 | 2.6748 | 5.5886 | 27.6489 | 27.8899 | | pnasnet5large | 16 | 5.0359 | 10.3861 | 24.0488 | 24.0914 | | resnest101e | 64 | 3.2415 | 7.5263 | 21.1781 | 20.7087 | | cait_m36_384 | 4 | 3.1865 | 7.6199 | 21.153 | 20.9181 | | jx_nest_base | 32 | 1.6404 | 3.5174 | 19.5164 | 19.6318 | | xcit_large_24_p8_224 | 5 | 2.9865 | 7.3893 | 18.9831 | 18.9317 | | convnext_base | 64 | 1.4387 | 2.6806 | 17.6813 | 17.6188 | | eca_halonext26ts | 128 | 1.2738 | 2.4199 | 16.4065 | 16.2804 | | poolformer_m36 | 64 | 1.6856 | 3.1149 | 15.8347 | 15.8709 | | res2net101_26w_4s | 64 | 3.1804 | 7.9261 | 15.4316 | 15.5139 | | res2net50_14w_8s | 128 | 2.8033 | 7.261 | 14.023 | 14.0996 | | sebotnet33ts_256 | 64 | 1.4937 | 2.9401 | 13.7598 | 13.5922 | | volo_d1_224 | 64 | 1.4303 | 3.1886 | 13.6178 | 13.4904 | | botnet26t_256 | 128 | 1.1163 | 2.1258 | 13.4993 | 13.3582 | | tnt_s_patch16_224 | 128 | 1.8458 | 4.3454 | 13.488 | 13.1775 | | dpn107 | 32 | 3.2437 | 6.4918 | 13.0539 | 12.9616 | | crossvit_9_240 | 128 | 1.5821 | 3.7067 | 11.4174 | 10.8666 | | gmlp_s16_224 | 128 | 1.2904 | 2.7068 | 11.1773 | 10.9321 | | fbnetv3_b | 128 | 2.6799 | 5.6824 | 11.0869 | 10.8418 | | eca_botnext26ts_256 | 128 | 1.212 | 2.3684 | 10.8006 | 10.5527 | | gmixer_24_224 | 128 | 1.3715 | 3.04 | 10.7386 | 10.5671 | | tf_mixnet_l | 128 | 2.9268 | 5.209 | 10.5769 | 10.5804 | | gluon_xception65 | 32 | 2.0418 | 5.2545 | 10.2944 | 10.0279 | | mixnet_l | 128 | 2.6305 | 4.882 | 9.7863 | 9.7688 | | convit_base | 64 | 1.1702 | 2.5556 | 9.422 | 9.3346 | | dla102 | 128 | 1.8288 | 4.5522 | 9.2009 | 9.1213 | | inception_v3 | 128 | 1.5867 | 4.0164 | 9.0338 | 8.7804 | | gluon_inception_v3 | 128 | 1.6059 | 3.9134 | 8.9316 | 8.8053 | | adv_inception_v3 | 128 | 1.553 | 3.8752 | 8.8676 | 8.8267 | | dm_nfnet_f0 | 128 | 2.1399 | 3.878 | 8.8535 | 8.6243 | | swsl_resnext101_32x16d | 32 | 1.8248 | 4.4364 | 8.7958 | 8.6511 | | res2next50 | 128 | 1.5706 | 4.0113 | 8.6387 | 8.5934 | | ghostnet_100 | 128 | 1.5369 | 3.7467 | 8.6026 | 8.1494 | | beit_base_patch16_224 | 64 | 1.1641 | 2.4449 | 8.3774 | 8.1952 | | cspdarknet53 | 64 | 1.9024 | 3.7138 | 7.9474 | 7.6717 | | rexnet_100 | 128 | 1.7184 | 3.4299 | 7.9134 | 7.7538 | | nfnet_l0 | 128 | 1.8644 | 3.4937 | 7.899 | 7.8174 | | tinynet_a | 128 | 1.7622 | 3.5576 | 7.6692 | 7.5266 | | resmlp_12_224 | 128 | 0.6248 | 1.1993 | 7.2887 | 7.2549 | | mixer_b16_224 | 128 | 0.664 | 1.3817 | 7.247 | 7.1023 | | pit_b_224 | 64 | 0.9875 | 2.2269 | 7.2143 | 7.0427 | | tf_efficientnet_b0 | 128 | 1.5763 | 3.0272 | 7.0067 | 6.8312 | | deit_base_distilled_patch16_224 | 64 | 0.8979 | 1.9326 | 6.4769 | 6.2163 | | fbnetc_100 | 128 | 1.6791 | 3.2861 | 6.3888 | 6.152 | | vit_base_patch16_224 | 64 | 0.8601 | 1.8736 | 6.3494 | 6.1961 | | spnasnet_100 | 128 | 1.6534 | 3.1682 | 6.1396 | 6.1671 | | mobilenetv3_large_100 | 128 | 1.3788 | 2.6467 | 6.0345 | 5.874 | | repvgg_a2 | 128 | 1.6135 | 3.0337 | 5.7661 | 5.6333 | | gernet_l | 128 | 1.7388 | 3.0408 | 5.7659 | 5.5789 | | convmixer_768_32 | 32 | 1.1829 | 2.8901 | 5.7082 | 5.4054 | | mobilenetv2_100 | 128 | 1.4172 | 2.6936 | 5.6773 | 5.5803 | | regnety_002 | 128 | 1.3786 | 2.6961 | 5.4571 | 5.3051 | | mnasnet_100 | 128 | 1.3785 | 2.5897 | 5.2115 | 5.139 | | visformer_small | 128 | 0.9126 | 1.9966 | 5.1558 | 5.0566 | | selecsls42b | 128 | 0.7467 | 1.737 | 4.2837 | 4.1223 | | ese_vovnet19b_dw | 128 | 0.8169 | 1.4457 | 4.0973 | 4.061 | | lcnet_050 | 128 | 0.8309 | 1.6434 | 3.9526 | 3.903 | +---------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Peak Memory Compression Ratio ~~~ +---------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+--------+-----------+----------+------------------------+ | mobilenetv2_100 | 128 | 1.2169 | 1.2169 | 1.6626 | 1.8034 | | rexnet_100 | 128 | 1.2152 | 1.2152 | 1.6556 | 1.7942 | | tinynet_a | 128 | 1.211 | 1.211 | 1.6361 | 1.7717 | | mnasnet_100 | 128 | 1.3795 | 1.3795 | 1.6285 | 1.8465 | | spnasnet_100 | 128 | 1.3794 | 1.3794 | 1.6281 | 1.846 | | ese_vovnet19b_dw | 128 | 1.4974 | 1.4974 | 1.6192 | 1.7101 | | fbnetc_100 | 128 | 1.1429 | 1.1429 | 1.5534 | 1.6826 | | mobilenetv3_large_100 | 128 | 1.2001 | 1.2001 | 1.5403 | 1.7144 | | fbnetv3_b | 128 | 1.199 | 1.199 | 1.5385 | 1.709 | | dm_nfnet_f0 | 128 | 1.1746 | 1.7712 | 1.5353 | 1.595 | | selecsls42b | 128 | 1.5922 | 1.5922 | 1.4768 | 1.592 | | gluon_xception65 | 32 | 1.4509 | 1.4509 | 1.4067 | 1.4509 | | sebotnet33ts_256 | 64 | 1.1867 | 1.1867 | 1.3764 | 1.401 | | cspdarknet53 | 64 | 1.6419 | 1.6419 | 1.3762 | 1.4148 | | pnasnet5large | 16 | 1.4331 | 0.9719 | 1.3588 | 1.3817 | | convmixer_768_32 | 32 | 1.3892 | 1.3892 | 1.329 | 1.3892 | | nfnet_l0 | 128 | 1.3949 | 1.3949 | 1.3271 | 1.3947 | | tf_efficientnet_b0 | 128 | 1.3195 | 1.3195 | 1.2494 | 1.3195 | | hrnet_w18 | 128 | 1.0656 | 1.0656 | 1.2462 | 1.3268 | | poolformer_m36 | 64 | 1.1898 | 1.1898 | 1.2225 | 1.2679 | | lcnet_050 | 128 | 1.2755 | 1.2755 | 1.1792 | 1.4693 | | res2net50_14w_8s | 128 | 1.2892 | 1.1422 | 1.1426 | 1.1966 | | res2next50 | 128 | 1.3228 | 1.1715 | 1.1364 | 1.1885 | | mixnet_l | 128 | 1.153 | 1.153 | 1.1192 | 1.153 | | tf_mixnet_l | 128 | 1.1531 | 1.1531 | 1.1192 | 1.1531 | | res2net101_26w_4s | 64 | 1.204 | 1.0983 | 1.0862 | 1.1267 | | eca_botnext26ts_256 | 128 | 1.1408 | 0.9998 | 1.0812 | 1.1408 | | eca_halonext26ts | 128 | 1.1407 | 0.9998 | 1.0812 | 1.1407 | | botnet26t_256 | 128 | 1.1397 | 0.9998 | 1.0806 | 1.1397 | | coat_lite_mini | 128 | 1.1029 | 1.0932 | 1.0794 | 1.1245 | | repvgg_a2 | 128 | 1.0636 | 1.0636 | 1.0778 | 1.1354 | | convnext_base | 64 | 1.1198 | 1.1156 | 1.0621 | 1.0878 | | regnety_002 | 128 | 1.0 | 1.0 | 1.0514 | 1.1972 | | mobilevit_s | 64 | 1.1646 | 1.1646 | 1.0267 | 1.0687 | | ghostnet_100 | 128 | 1.1112 | 1.1112 | 1.0214 | 1.1112 | | swsl_resnext101_32x16d | 32 | 1.0 | 0.9816 | 0.9915 | 1.0 | | dla102 | 128 | 1.0 | 1.0 | 0.9641 | 1.0 | | twins_pcpvt_base | 64 | 1.0801 | 1.0709 | 0.9525 | 0.9806 | | convit_base | 64 | 1.1582 | 1.1567 | 0.9486 | 0.9714 | | gluon_inception_v3 | 128 | 1.0003 | 1.0003 | 0.9485 | 1.0001 | | inception_v3 | 128 | 1.0003 | 1.0003 | 0.9485 | 1.0001 | | adv_inception_v3 | 128 | 1.0003 | 1.0003 | 0.9485 | 1.0001 | | gernet_l | 128 | 1.0 | 1.0 | 0.9359 | 1.0 | | cait_m36_384 | 4 | 1.0086 | 1.008 | 0.9354 | 0.9394 | | resnest101e | 64 | 1.0 | 0.8541 | 0.9281 | 0.959 | | volo_d1_224 | 64 | 1.0 | 1.0 | 0.9139 | 0.9537 | | dpn107 | 32 | 1.2072 | 1.1164 | 0.9081 | 0.9187 | | xcit_large_24_p8_224 | 5 | 1.0129 | 1.0129 | 0.876 | 0.8794 | | jx_nest_base | 32 | 1.1102 | 1.1084 | 0.8737 | 0.8862 | | pit_b_224 | 64 | 1.0669 | 1.0659 | 0.8617 | 0.8724 | | mixer_b16_224 | 128 | 1.1738 | 1.1696 | 0.8587 | 0.899 | | visformer_small | 128 | 1.1201 | 1.1201 | 0.8585 | 0.9032 | | swin_base_patch4_window7_224 | 64 | 1.3578 | 1.3515 | 0.835 | 0.8479 | | beit_base_patch16_224 | 64 | 1.0658 | 1.0637 | 0.8089 | 0.8318 | | deit_base_distilled_patch16_224 | 64 | 1.0676 | 1.0663 | 0.7983 | 0.8219 | | vit_base_patch16_224 | 64 | 1.0663 | 1.0642 | 0.7981 | 0.8206 | | resmlp_12_224 | 128 | 1.1837 | 1.1837 | 0.7744 | 0.8456 | | crossvit_9_240 | 128 | 1.0494 | 1.0445 | 0.6775 | 0.7192 | | gmixer_24_224 | 128 | 1.1635 | 1.1479 | 0.6682 | 0.7147 | | gmlp_s16_224 | 128 | 1.0787 | 1.0592 | 0.6609 | 0.7151 | | tnt_s_patch16_224 | 128 | 1.2117 | 1.0496 | 0.5094 | 0.5345 | +---------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Absolute latency (ms) ~~~ +---------------------------------+-----+----------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+----------+-----------+----------+------------------------+ | tnt_s_patch16_224 | 128 | 148.9654 | 149.0719 | 104.6927 | 105.1293 | | convmixer_768_32 | 32 | 109.0607 | 109.1017 | 87.9242 | 87.9553 | | dm_nfnet_f0 | 128 | 119.1667 | 119.2934 | 68.3774 | 69.63 | | swin_base_patch4_window7_224 | 64 | 74.2176 | 74.1953 | 66.6548 | 66.9677 | | pnasnet5large | 16 | 80.328 | 80.3555 | 62.6038 | 62.9718 | | dla102 | 128 | 84.2243 | 84.2376 | 62.1823 | 62.4255 | | hrnet_w18 | 128 | 82.9734 | 83.1324 | 60.4921 | 61.3852 | | cait_m36_384 | 4 | 83.1308 | 82.5154 | 58.9528 | 59.8852 | | nfnet_l0 | 128 | 94.9852 | 94.9558 | 57.3149 | 57.841 | | swsl_resnext101_32x16d | 32 | 69.6202 | 69.5645 | 55.1758 | 56.4839 | | mixer_b16_224 | 128 | 55.0737 | 55.147 | 48.2593 | 48.2582 | | convit_base | 64 | 65.7007 | 65.6777 | 47.9436 | 48.2889 | | tf_mixnet_l | 128 | 62.5082 | 62.5203 | 47.6899 | 48.061 | | mixnet_l | 128 | 60.2366 | 60.2873 | 47.2427 | 47.7046 | | poolformer_m36 | 64 | 70.4273 | 70.4285 | 47.001 | 47.663 | | inception_v3 | 128 | 60.3174 | 60.3455 | 46.7317 | 46.8366 | | gluon_inception_v3 | 128 | 60.3486 | 60.3433 | 46.7135 | 46.8547 | | adv_inception_v3 | 128 | 60.3702 | 60.3922 | 46.711 | 46.8479 | | pit_b_224 | 64 | 53.8501 | 53.8641 | 46.6385 | 46.9295 | | resnest101e | 64 | 70.3262 | 70.2225 | 45.651 | 46.8674 | | res2next50 | 128 | 65.1489 | 65.0755 | 44.7873 | 45.5519 | | dpn107 | 32 | 55.9111 | 55.9176 | 42.5219 | 43.0376 | | res2net50_14w_8s | 128 | 60.6821 | 60.6256 | 41.9294 | 42.3783 | | gluon_xception65 | 32 | 48.8572 | 49.0012 | 41.1466 | 41.3234 | | convnext_base | 64 | 56.9532 | 57.075 | 40.8314 | 41.1649 | | visformer_small | 128 | 43.5331 | 43.4914 | 40.6089 | 40.8692 | | beit_base_patch16_224 | 64 | 43.3181 | 43.3553 | 38.7694 | 38.9668 | | deit_base_distilled_patch16_224 | 64 | 41.2645 | 41.3351 | 37.8458 | 37.9115 | | vit_base_patch16_224 | 64 | 41.0123 | 40.9416 | 37.8111 | 37.9837 | | twins_pcpvt_base | 64 | 39.5165 | 39.5124 | 34.5647 | 35.2026 | | gmixer_24_224 | 128 | 45.9274 | 45.9138 | 34.3347 | 34.4751 | | gmlp_s16_224 | 128 | 49.3377 | 49.3872 | 32.3656 | 32.5592 | | volo_d1_224 | 64 | 50.4443 | 50.7984 | 32.1044 | 32.4684 | | res2net101_26w_4s | 64 | 45.3913 | 45.4029 | 31.876 | 32.2707 | | fbnetv3_b | 128 | 40.8737 | 40.8793 | 31.7002 | 31.8064 | | jx_nest_base | 32 | 41.393 | 41.6816 | 29.5727 | 30.1181 | | eca_halonext26ts | 128 | 43.7129 | 43.7007 | 28.8765 | 29.1417 | | botnet26t_256 | 128 | 42.6847 | 42.6623 | 28.1344 | 28.161 | | coat_lite_mini | 128 | 41.185 | 41.2007 | 28.0194 | 28.3056 | | gernet_l | 128 | 37.9461 | 37.9274 | 27.8385 | 27.8736 | | eca_botnext26ts_256 | 128 | 42.3338 | 42.3053 | 27.4495 | 27.6741 | | cspdarknet53 | 64 | 34.1121 | 34.1163 | 24.8253 | 25.037 | | repvgg_a2 | 128 | 35.2978 | 35.3267 | 24.7215 | 24.7609 | | crossvit_9_240 | 128 | 30.1934 | 30.2016 | 23.7378 | 23.9801 | | xcit_large_24_p8_224 | 5 | 34.6663 | 34.9365 | 22.7257 | 23.2079 | | tf_efficientnet_b0 | 128 | 30.0324 | 30.0328 | 22.4684 | 22.4928 | | mobilevit_s | 64 | 28.573 | 28.5851 | 21.9561 | 22.1538 | | sebotnet33ts_256 | 64 | 35.1965 | 35.1909 | 20.7381 | 20.8677 | | fbnetc_100 | 128 | 25.4068 | 25.4258 | 19.4487 | 19.4747 | | rexnet_100 | 128 | 26.2233 | 26.262 | 18.8352 | 18.9135 | | selecsls42b | 128 | 23.4263 | 23.4275 | 18.3751 | 18.4075 | | ese_vovnet19b_dw | 128 | 24.9002 | 24.9009 | 18.2897 | 18.2909 | | tinynet_a | 128 | 24.2155 | 24.2104 | 17.5386 | 17.6373 | | resmlp_12_224 | 128 | 21.8646 | 21.8764 | 16.7226 | 16.71 | | spnasnet_100 | 128 | 21.5151 | 21.5202 | 16.0869 | 16.1043 | | mnasnet_100 | 128 | 20.3695 | 20.3718 | 15.4352 | 15.4274 | | mobilenetv2_100 | 128 | 19.2904 | 19.3011 | 13.5938 | 13.5819 | | mobilenetv3_large_100 | 128 | 16.4404 | 16.4508 | 12.2472 | 12.255 | | ghostnet_100 | 128 | 16.5424 | 16.5664 | 11.9457 | 11.9185 | | regnety_002 | 128 | 9.673 | 9.701 | 6.7712 | 6.9554 | | lcnet_050 | 128 | 5.1395 | 5.1785 | 3.6353 | 3.7228 | +---------------------------------+-----+----------+-----------+----------+------------------------+ ~~~

Build Summary

see more ### Run name ### day_354_20_12_22_performance_float32_370 ### Commit hashes ### pytorch commit: 88c581be87ac59ea1251f35a57b610ae81b9362d pytorch commit date: 2022-12-21 04:51:51+00:00 functorch Absent torchbench commit: 43ca0857e9c7b9d90f647d1befbaee1dfe446d7e torchbench commit date: 2022-12-16 10:47:24-08:00 ### TorchDynamo config flags ### torch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False torch._dynamo.config.HAS_REFS_PRIMS = True torch._dynamo.config.capture_scalar_outputs = False torch._dynamo.config.dead_code_elimination = True torch._dynamo.config.disable = False torch._dynamo.config.dynamic_shapes = False torch._dynamo.config.enforce_cond_guards_match = True torch._dynamo.config.error_on_nested_fx_trace = True torch._dynamo.config.guard_nn_modules = False torch._dynamo.config.normalize_ir = False torch._dynamo.config.optimize_ddp = True torch._dynamo.config.output_code = False torch._dynamo.config.output_graph_code = False torch._dynamo.config.print_graph_breaks = False torch._dynamo.config.raise_on_ctx_manager_usage = True torch._dynamo.config.raise_on_unsafe_aot_autograd = False torch._dynamo.config.replay_record_enabled = False torch._dynamo.config.rewrite_assert_with_torch_assert = True torch._dynamo.config.specialize_int_float = True torch._dynamo.config.suppress_errors = False torch._dynamo.config.verbose = False torch._dynamo.config.verify_correctness = False ### Torch version ### torch: 2.0.0a0+git88c581b ### Environment variables ### TORCH_CUDA_ARCH_LIST = 8.0 CUDA_HOME = /usr/local/cuda-11.6 USE_LLVM = /usr/lib/llvm-10 ### GPU details ### CUDNN VERSION: 8302 Number CUDA Devices: 8 Device Name: NVIDIA A100-SXM4-40GB Device Memory [GB]: 42.314694656
anijain2305 commented 1 year ago

Inference Performance Dashboard for float16 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. This is inference run. For accuracy, we check the numerical correctness of forward pass outputs. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio. Caveats 1) Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint. 2) Experiments do not cover dynamic shapes. 3) Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 95%, 57/60 | 100%, 45/45 | 100%, 59/59 |
|       aot_eager        | 92%, 55/60 | 100%, 45/45 | 100%, 59/59 |
|        inductor        | 90%, 54/60 | 100%, 45/45 | 95%, 56/59  |
| inductor_no_cudagraphs | 92%, 55/60 | 100%, 45/45 | 95%, 56/59  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.01x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.50x    |    1.48x    |    1.41x    |
| inductor_no_cudagraphs |   1.37x    |    1.37x    |    1.38x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.07    |    2.98     |    1.84     |
|       aot_eager        |    3.11    |    5.66     |    4.07     |
|        inductor        |    8.70    |    16.24    |    12.75    |
| inductor_no_cudagraphs |    8.06    |    14.24    |    12.53    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.03x    |    1.02x    |    1.16x    |
|       aot_eager        |   1.02x    |    1.02x    |    1.12x    |
|        inductor        |   0.98x    |    1.14x    |    1.06x    |
| inductor_no_cudagraphs |   1.05x    |    1.22x    |    1.13x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float16 precision

see more Performance speedup ~~~ +-----------------------------------+------+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+--------+-----------+----------+------------------------+ | drq | 1 | 1.0231 | 0.9921 | 3.3913 | 1.2582 | | soft_actor_critic | 256 | 1.0442 | 0.9511 | 2.8713 | 1.1857 | | hf_T5_base | 1 | 0.9635 | 0.9343 | 2.5945 | 2.47 | | phlippe_densenet | 128 | 1.0165 | 0.9733 | 2.4943 | 1.6525 | | hf_T5 | 4 | 0.9892 | 0.9733 | 2.3937 | 2.6655 | | hf_T5_large | 1 | 0.8395 | 0.737 | 2.3308 | 1.3055 | | hf_Reformer | 8 | 0.9976 | 1.0031 | 2.1214 | 2.0966 | | hf_Albert | 16 | 1.0015 | 1.002 | 1.987 | 1.9339 | | lennard_jones | 1000 | 0.8397 | 0.8577 | 1.9786 | 0.7803 | | timm_nfnet | 128 | 0.9985 | 0.9988 | 1.8871 | 1.7725 | | dlrm | 1 | 0.9811 | 1.0549 | 1.818 | 1.0991 | | hf_GPT2_large | 1 | 1.005 | 0.9992 | 1.8035 | 1.7802 | | densenet121 | 64 | 0.9986 | 1.0006 | 1.6773 | 1.6246 | | squeezenet1_1 | 256 | 0.9987 | 0.9982 | 1.6314 | 1.62 | | pytorch_CycleGAN_and_pix2pix | 1 | 0.9924 | 0.9623 | 1.6101 | 1.4443 | | BERT_pytorch | 32 | 1.0359 | 0.8963 | 1.6076 | 1.5589 | | hf_GPT2 | 16 | 1.0001 | 0.9998 | 1.5983 | 1.5928 | | vision_maskrcnn | 4 | 0.9674 | 0.9444 | 1.5583 | 1.6272 | | timm_resnest | 256 | 0.9994 | 0.9996 | 1.5561 | 1.5555 | | shufflenet_v2_x1_0 | 128 | 0.9993 | 0.9995 | 1.5426 | 1.4606 | | hf_Longformer | 4 | 0.9999 | 1.0009 | 1.5424 | 1.5347 | | phlippe_resnet | 256 | 1.022 | 0.9479 | 1.5302 | 1.4261 | | pytorch_unet | 4 | 0.9987 | 0.9996 | 1.5269 | 1.5227 | | resnet50 | 64 | 0.9979 | 0.9983 | 1.5116 | 1.4846 | | Background_Matting | 1 | 0.9969 | 0.998 | 1.5069 | 1.4845 | | resnext50_32x4d | 64 | 0.9985 | 0.9991 | 1.4894 | 1.4685 | | resnet152 | 64 | 0.9993 | 0.9992 | 1.4667 | 1.4365 | | mobilenet_v2 | 128 | 0.9981 | 0.9978 | 1.4615 | 1.4519 | | timm_efficientnet | 128 | 0.9986 | 0.9993 | 1.4371 | 1.4219 | | hf_Bert | 8 | 1.0081 | 1.0013 | 1.4283 | 1.3829 | | mnasnet1_0 | 128 | 0.998 | 0.998 | 1.4229 | 1.4096 | | hf_Bert_large | 4 | 1.0089 | 0.9981 | 1.4199 | 1.3841 | | timm_regnet | 32 | 0.9994 | 0.9988 | 1.4188 | 1.3686 | | mobilenet_v3_large | 128 | 0.9984 | 0.9989 | 1.3772 | 1.3674 | | hf_DistilBert | 16 | 1.0004 | 0.9983 | 1.3671 | 1.3511 | | timm_vovnet | 128 | 0.9989 | 0.9992 | 1.3529 | 1.3434 | | functorch_dp_cifar10 | 512 | 0.9917 | 0.9945 | 1.3476 | 1.2649 | | speech_transformer | 1 | 0.9848 | 0.874 | 1.3413 | 1.344 | | yolov3 | 8 | 1.0164 | 0.9992 | 1.3396 | 1.3006 | | attention_is_all_you_need_pytorch | 256 | 0.9995 | 0.9154 | 1.3255 | 1.2849 | | vgg16 | 8 | 0.993 | 0.9947 | 1.3175 | 1.2545 | | resnet18 | 256 | 0.9983 | 0.9989 | 1.2885 | 1.2899 | | LearningToPaint | 256 | 0.9966 | 0.9974 | 1.2844 | 1.2739 | | Super_SloMo | 8 | 0.9994 | 0.9993 | 1.2544 | 1.2371 | | alexnet | 1024 | 0.9991 | 0.9991 | 1.2458 | 1.2742 | | fastNLP_Bert | 16 | 0.9967 | 0.9948 | 1.22 | 1.219 | | dcgan | 1024 | 0.9947 | 0.9947 | 1.1862 | 1.1835 | | hf_Bart | 8 | 0.9804 | 0.9313 | 1.1664 | 1.1133 | | pytorch_stargan | 16 | 1.0385 | 0.9059 | 1.1603 | 1.1612 | | doctr_det_predictor | 4 | 1.0001 | 1.0059 | 1.1239 | 1.1252 | | timm_vision_transformer | 128 | 0.9993 | 0.9997 | 1.1072 | 1.1008 | | timm_vision_transformer_large | 8 | 1.0003 | 1.0008 | 1.0519 | 1.0423 | | demucs | 32 | 0.9995 | 0.9997 | 0.9997 | 0.9994 | | doctr_reco_predictor | 64 | 1.0554 | 0.9928 | 0.99 | 0.997 | | tts_angular | 512 | 1.0072 | 1.1018 | 0.9884 | 0.9962 | | nvidia_deeprecommender | 512 | 0.9976 | 0.9967 | 0.9445 | 0.9959 | | hf_BigBird | 4 | 0.9953 | 0.9829 | 0.0 | 1.2566 | | tacotron2 | 128 | 1.0925 | 0.0 | 0.0 | 0.0 | | moco | 64 | 0.9928 | 0.0 | 0.0 | 0.0 | | detectron2_fcos_r_50_fpn | 0 | 0.0 | 0.0 | 0.0 | 0.0 | +-----------------------------------+------+--------+-----------+----------+------------------------+ ~~~ Accuracy ~~~ +-----------------------------------+-----+------------------+------------------+------------------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+-----+------------------+------------------+------------------+------------------------+ | hf_GPT2_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | timm_vision_transformer_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | hf_T5_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | yolov3 | 4 | pass | pass | pass | pass | | soft_actor_critic | 256 | pass | pass | pass | pass | | phlippe_densenet | 4 | pass | pass | pass | pass | | phlippe_resnet | 4 | pass | pass | pass | pass | | pytorch_CycleGAN_and_pix2pix | 1 | pass | pass | pass | pass | | pytorch_stargan | 16 | pass | pass | pass | pass | | pytorch_unet | 2 | pass | pass | pass | pass | | resnet152 | 4 | pass | pass | pass | pass | | resnet18 | 4 | pass | pass | pass | pass | | resnet50 | 4 | pass | pass | pass | pass | | resnext50_32x4d | 4 | pass | pass | pass | pass | | shufflenet_v2_x1_0 | 4 | pass | pass | pass | pass | | speech_transformer | 4 | pass | pass | pass | pass | | vision_maskrcnn | 4 | pass | pass | pass | pass | | mobilenet_v3_large | 4 | pass | pass | pass | pass | | timm_efficientdet | 4 | pass | pass | pass | pass | | timm_efficientnet | 4 | pass | pass | pass | pass | | timm_nfnet | 4 | pass | pass | pass | pass | | timm_regnet | 4 | pass | pass | pass | pass | | timm_resnest | 4 | pass | pass | pass | pass | | timm_vision_transformer | 4 | pass | pass | pass | pass | | timm_vovnet | 4 | pass | pass | pass | pass | | tts_angular | 4 | pass | pass | pass | pass | | vgg16 | 4 | pass | pass | pass | pass | | squeezenet1_1 | 4 | pass | pass | pass | pass | | nvidia_deeprecommender | 4 | pass | pass | pass | pass | | mobilenet_v2 | 4 | pass | pass | pass | pass | | drq | 1 | pass | pass | pass | pass | | BERT_pytorch | 4 | pass | pass | pass | pass | | Background_Matting | 1 | pass | pass | pass | pass | | LearningToPaint | 4 | pass | pass | pass | pass | | Super_SloMo | 4 | pass | pass | pass | pass | | alexnet | 4 | pass | pass | pass | pass | | attention_is_all_you_need_pytorch | 4 | pass | pass | pass | pass | | dcgan | 4 | pass | pass | pass | pass | | demucs | 32 | pass | pass | pass | pass | | densenet121 | 4 | pass | pass | pass | pass | | dlrm | 4 | pass | pass | pass | pass | | mnasnet1_0 | 4 | pass | pass | pass | pass | | doctr_reco_predictor | 4 | pass | pass | pass | pass | | fastNLP_Bert | 4 | pass | pass | pass | pass | | hf_GPT2 | 2 | pass | pass | pass | pass | | lennard_jones | 4 | pass | pass | pass | pass | | hf_T5 | 4 | pass | pass | pass | pass | | hf_Reformer | 4 | pass | pass | pass | pass | | functorch_dp_cifar10 | 4 | pass | pass | pass | pass | | hf_Longformer | 4 | pass | pass | pass | pass | | hf_DistilBert | 4 | pass | pass | pass | pass | | hf_Bert_large | 4 | pass | pass | pass | pass | | hf_Bert | 4 | pass | pass | pass | pass | | hf_Bart | 4 | pass | pass | pass | pass | | hf_Albert | 4 | pass | pass | pass | pass | | hf_BigBird | 4 | pass | pass | fail_to_run | pass | | moco | 4 | pass | fail_to_run | fail_to_run | fail_to_run | | tacotron2 | 4 | pass | fail_to_run | 0.0000 | 0.0000 | | detectron2_fcos_r_50_fpn | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | | doctr_det_predictor | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +-----------------------------------+-----+------------------+------------------+------------------+------------------------+ ~~~ Compilation latency (sec) ~~~ +-----------------------------------+------+----------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+----------+-----------+----------+------------------------+ | hf_T5_large | 1 | 12.2447 | 19.3454 | 50.4579 | 31.4887 | | hf_Longformer | 4 | 4.7509 | 9.6289 | 33.8898 | 33.9265 | | hf_T5_base | 1 | 6.672 | 10.2255 | 27.2742 | 18.041 | | vision_maskrcnn | 4 | 4.1237 | 7.0312 | 27.1427 | 22.9349 | | hf_GPT2_large | 1 | 5.1424 | 9.6722 | 19.7855 | 19.3883 | | attention_is_all_you_need_pytorch | 256 | 1.3425 | 3.0705 | 19.06 | 17.538 | | timm_vision_transformer_large | 8 | 3.2973 | 7.4977 | 18.5304 | 18.5436 | | yolov3 | 8 | 1.756 | 3.6947 | 18.1289 | 18.2339 | | speech_transformer | 1 | 1.884 | 4.3889 | 17.3188 | 16.2409 | | hf_Bart | 8 | 2.8131 | 4.979 | 16.8544 | 15.953 | | hf_T5 | 4 | 3.8294 | 5.7653 | 16.3204 | 11.7282 | | densenet121 | 64 | 2.0815 | 5.2542 | 15.5421 | 15.2114 | | hf_Reformer | 8 | 1.584 | 2.7683 | 14.9387 | 14.1688 | | resnet152 | 64 | 2.3688 | 6.3168 | 13.4284 | 12.7992 | | hf_Bert_large | 4 | 3.4467 | 6.9766 | 12.3296 | 12.2077 | | Super_SloMo | 8 | 1.209 | 3.1097 | 10.4505 | 10.213 | | timm_nfnet | 128 | 2.2102 | 4.1048 | 9.7131 | 9.5693 | | fastNLP_Bert | 16 | 1.7063 | 3.753 | 9.4791 | 8.6099 | | hf_GPT2 | 16 | 1.7526 | 3.3334 | 8.3712 | 8.4044 | | BERT_pytorch | 32 | 1.6422 | 3.612 | 8.1593 | 7.9738 | | timm_regnet | 32 | 1.9398 | 3.7542 | 7.3351 | 7.1522 | | doctr_det_predictor | 4 | 1.2721 | 3.171 | 7.1826 | 7.0412 | | timm_vision_transformer | 128 | 1.021 | 2.3209 | 7.1031 | 7.0092 | | timm_efficientnet | 128 | 1.5071 | 3.0849 | 6.9534 | 6.791 | | hf_Bert | 8 | 1.7702 | 3.6606 | 6.6344 | 6.5585 | | timm_resnest | 256 | 0.5911 | 1.31 | 6.6269 | 6.5849 | | hf_Albert | 16 | 1.5999 | 3.3117 | 6.4241 | 6.393 | | shufflenet_v2_x1_0 | 128 | 1.0024 | 2.5031 | 6.0376 | 5.7576 | | phlippe_densenet | 128 | 0.8583 | 2.2377 | 5.8409 | 5.7502 | | mobilenet_v3_large | 128 | 0.9156 | 2.3675 | 5.8266 | 5.6523 | | hf_DistilBert | 16 | 0.841 | 1.7546 | 5.5053 | 5.2614 | | mobilenet_v2 | 128 | 0.8734 | 2.2504 | 5.4822 | 5.2708 | | Background_Matting | 1 | 0.9136 | 2.3267 | 5.3075 | 5.0218 | | resnext50_32x4d | 64 | 0.8811 | 2.2156 | 5.127 | 4.9836 | | resnet50 | 64 | 0.8824 | 2.2278 | 5.1196 | 4.9724 | | timm_vovnet | 128 | 1.1179 | 2.1782 | 5.1106 | 4.9868 | | mnasnet1_0 | 128 | 0.8087 | 2.0815 | 4.8432 | 4.829 | | functorch_dp_cifar10 | 512 | 0.295 | 0.5761 | 3.4323 | 3.3616 | | pytorch_CycleGAN_and_pix2pix | 1 | 0.4629 | 1.0965 | 3.4315 | 3.3117 | | pytorch_unet | 4 | 0.4831 | 1.1262 | 3.0902 | 2.8645 | | pytorch_stargan | 16 | 0.4267 | 1.1355 | 2.8676 | 2.8132 | | LearningToPaint | 256 | 0.4272 | 0.9849 | 2.7735 | 2.529 | | resnet18 | 256 | 0.4089 | 0.9327 | 2.6659 | 2.5204 | | phlippe_resnet | 256 | 0.3992 | 0.9224 | 2.4604 | 2.3152 | | squeezenet1_1 | 256 | 0.2381 | 0.3943 | 1.9004 | 1.7338 | | vgg16 | 8 | 0.1875 | 0.315 | 1.5229 | 1.4794 | | alexnet | 1024 | 0.163 | 0.2448 | 1.4844 | 1.3551 | | drq | 1 | 0.3087 | 0.4127 | 1.4449 | 1.2545 | | dcgan | 1024 | 0.1562 | 0.2679 | 1.2373 | 1.099 | | nvidia_deeprecommender | 512 | 0.1923 | 0.2913 | 1.2284 | 1.0979 | | dlrm | 1 | 0.2556 | 0.4147 | 1.2182 | 1.105 | | soft_actor_critic | 256 | 0.2148 | 0.2796 | 1.0579 | 0.9408 | | lennard_jones | 1000 | 0.1422 | 0.2229 | 0.9876 | 0.8925 | | tts_angular | 512 | 0.1734 | 0.1932 | 0.9415 | 0.8246 | | doctr_reco_predictor | 64 | 0.75 | 0.7487 | 0.621 | 0.6318 | | demucs | 32 | 0.2829 | 0.2844 | 0.1939 | 0.1918 | | hf_BigBird | 4 | 4.0509 | 6.1417 | nan | 13.1385 | | tacotron2 | 128 | 122.5979 | nan | nan | nan | | moco | 64 | 22.626 | nan | nan | nan | | detectron2_fcos_r_50_fpn | 0 | nan | nan | nan | nan | +-----------------------------------+------+----------+-----------+----------+------------------------+ ~~~ Peak Memory Compression Ratio ~~~ +-----------------------------------+------+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+--------+-----------+----------+------------------------+ | timm_efficientnet | 128 | 1.214 | 1.214 | 1.6445 | 1.7882 | | pytorch_unet | 4 | 1.4991 | 1.1565 | 1.488 | 1.4991 | | timm_vovnet | 128 | 1.2922 | 1.2922 | 1.4609 | 1.5129 | | mobilenet_v2 | 128 | 1.072 | 1.072 | 1.4569 | 1.5861 | | timm_nfnet | 128 | 1.1392 | 1.6403 | 1.4486 | 1.503 | | squeezenet1_1 | 256 | 1.0 | 0.9658 | 1.4303 | 1.5976 | | mnasnet1_0 | 128 | 1.1252 | 1.1252 | 1.322 | 1.5064 | | Background_Matting | 1 | 1.2867 | 1.2864 | 1.1987 | 1.2138 | | demucs | 32 | 1.1134 | 1.1134 | 1.1134 | 1.1134 | | yolov3 | 8 | 1.1163 | 1.1163 | 1.0986 | 1.1181 | | doctr_det_predictor | 4 | 0.512 | 0.512 | 1.0577 | 0.5098 | | shufflenet_v2_x1_0 | 128 | 1.0 | 0.9706 | 1.0543 | 1.2701 | | pytorch_stargan | 16 | 1.0488 | 1.0453 | 1.0488 | 1.0488 | | phlippe_resnet | 256 | 1.1662 | 1.1661 | 1.0466 | 1.1662 | | phlippe_densenet | 128 | 1.2207 | 1.2205 | 1.0303 | 1.0779 | | hf_Albert | 16 | 1.0231 | 1.0199 | 1.0156 | 1.0231 | | hf_DistilBert | 16 | 1.0157 | 1.0146 | 1.0103 | 1.0157 | | pytorch_CycleGAN_and_pix2pix | 1 | 1.0 | 0.9994 | 1.007 | 0.9999 | | hf_Bert | 8 | 1.0088 | 1.0076 | 1.0029 | 1.0088 | | hf_Bert_large | 4 | 1.0033 | 1.0026 | 1.0 | 1.0033 | | hf_GPT2 | 16 | 1.0 | 0.999 | 0.9986 | 1.0 | | hf_GPT2_large | 1 | 0.9997 | 0.9995 | 0.9986 | 0.9996 | | dlrm | 1 | 1.0 | 1.0 | 0.998 | 1.0 | | doctr_reco_predictor | 64 | 0.9976 | 0.9976 | 0.9976 | 0.9976 | | nvidia_deeprecommender | 512 | 1.001 | 1.001 | 0.9974 | 1.142 | | vgg16 | 8 | 1.0 | 1.0 | 0.9876 | 1.0 | | timm_regnet | 32 | 1.0 | 1.0 | 0.9803 | 0.9993 | | resnet152 | 64 | 1.0 | 0.8642 | 0.9654 | 1.0 | | hf_Longformer | 4 | 0.5586 | 0.5582 | 0.9649 | 0.9892 | | attention_is_all_you_need_pytorch | 256 | 1.0316 | 1.0276 | 0.9589 | 0.9658 | | resnet50 | 64 | 1.0 | 0.8326 | 0.9561 | 1.0 | | resnext50_32x4d | 64 | 1.0 | 0.8308 | 0.9556 | 1.0 | | timm_vision_transformer_large | 8 | 1.004 | 1.0037 | 0.9548 | 0.9554 | | tts_angular | 512 | 0.9982 | 0.9982 | 0.953 | 0.9982 | | dcgan | 1024 | 1.0 | 1.0 | 0.9486 | 1.0 | | hf_Bart | 8 | 1.0 | 1.0 | 0.9393 | 1.0 | | mobilenet_v3_large | 128 | 1.0 | 1.0 | 0.9354 | 0.9999 | | resnet18 | 256 | 1.0 | 1.0 | 0.9327 | 1.0 | | Super_SloMo | 8 | 1.1498 | 0.8973 | 0.9319 | 0.9493 | | functorch_dp_cifar10 | 512 | 1.0 | 0.9914 | 0.9299 | 1.0 | | timm_resnest | 256 | 1.0 | 0.8179 | 0.9091 | 0.9473 | | alexnet | 1024 | 1.0 | 0.864 | 0.8945 | 1.0664 | | LearningToPaint | 256 | 1.0 | 1.0 | 0.8842 | 1.0 | | fastNLP_Bert | 16 | 1.0612 | 1.0592 | 0.8671 | 0.8746 | | vision_maskrcnn | 4 | 0.8477 | 0.8474 | 0.8293 | 0.8474 | | drq | 1 | 0.9626 | 0.9626 | 0.7848 | 0.9626 | | hf_T5_large | 1 | 0.9528 | 0.9558 | 0.7593 | 0.9611 | | soft_actor_critic | 256 | 1.0 | 1.0 | 0.7148 | 1.0 | | timm_vision_transformer | 128 | 1.1044 | 1.0931 | 0.6964 | 0.7507 | | BERT_pytorch | 32 | 1.0257 | 1.0257 | 0.6706 | 0.674 | | hf_Reformer | 8 | 1.3472 | 1.4449 | 0.6307 | 0.6736 | | densenet121 | 64 | 1.1487 | 0.8823 | 0.5842 | 0.6043 | | lennard_jones | 1000 | 1.0 | 1.0 | 0.5591 | 1.0 | | speech_transformer | 1 | 1.0602 | 1.0594 | 0.4985 | 0.4998 | | hf_T5 | 4 | 0.6854 | 0.7344 | 0.3659 | 0.8779 | | hf_T5_base | 1 | 0.7851 | 0.8082 | 0.3305 | 0.902 | | hf_BigBird | 4 | 0.8569 | 0.8569 | nan | 0.8569 | | moco | 64 | 1.0 | nan | nan | nan | | tacotron2 | 128 | 0.7969 | nan | nan | nan | | detectron2_fcos_r_50_fpn | 0 | nan | nan | nan | nan | +-----------------------------------+------+--------+-----------+----------+------------------------+ ~~~ Absolute latency (ms) ~~~ +-----------------------------------+------+----------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------+------+----------+-----------+----------+------------------------+ | hf_Longformer | 4 | 167.5839 | 167.5337 | 108.6833 | 109.2011 | | doctr_det_predictor | 4 | 66.2587 | 66.8761 | 61.4194 | 61.4289 | | demucs | 32 | 61.1252 | 61.0633 | 60.9794 | 61.0849 | | hf_GPT2 | 16 | 88.6055 | 88.586 | 55.4147 | 55.6209 | | timm_vision_transformer_large | 8 | 48.8916 | 48.9394 | 46.4665 | 46.9163 | | fastNLP_Bert | 16 | 49.6382 | 50.3561 | 41.1403 | 40.3824 | | hf_T5 | 4 | 98.1321 | 99.8324 | 40.8111 | 36.4306 | | hf_T5_base | 1 | 97.6342 | 100.8787 | 36.3609 | 38.1456 | | Super_SloMo | 8 | 41.6154 | 41.6842 | 33.1723 | 33.6203 | | vision_maskrcnn | 4 | 48.7532 | 52.2999 | 30.392 | 29.1956 | | timm_vision_transformer | 128 | 32.665 | 32.6367 | 29.4762 | 29.6575 | | pytorch_unet | 4 | 44.6134 | 44.6071 | 29.198 | 29.2839 | | hf_T5_large | 1 | 74.36 | 84.8579 | 27.084 | 48.8089 | | timm_resnest | 256 | 39.3627 | 39.3615 | 25.2915 | 25.3065 | | timm_nfnet | 128 | 43.2852 | 43.4349 | 23.102 | 24.4854 | | resnet152 | 64 | 31.923 | 32.2496 | 21.7956 | 22.151 | | hf_GPT2_large | 1 | 27.5822 | 27.7416 | 18.0711 | 18.4133 | | hf_Bart | 8 | 21.0853 | 22.2353 | 17.794 | 18.3976 | | alexnet | 1024 | 23.8231 | 22.0259 | 17.658 | 17.2719 | | timm_vovnet | 128 | 23.3373 | 23.3876 | 17.222 | 17.3488 | | hf_Albert | 16 | 28.6986 | 28.6794 | 14.4724 | 14.8775 | | resnet18 | 256 | 16.3551 | 16.3722 | 12.6782 | 12.6559 | | hf_Reformer | 8 | 27.1484 | 26.7609 | 12.6749 | 12.8141 | | timm_regnet | 32 | 18.43 | 17.7236 | 12.5566 | 13.6002 | | hf_Bert_large | 4 | 17.3728 | 17.6152 | 12.3874 | 12.6737 | | densenet121 | 64 | 19.6994 | 19.6848 | 11.6748 | 12.0642 | | resnext50_32x4d | 64 | 17.2657 | 17.2804 | 11.601 | 11.7697 | | timm_efficientnet | 128 | 16.5735 | 16.5574 | 11.4987 | 11.633 | | speech_transformer | 1 | 14.0249 | 16.5825 | 10.6054 | 10.5122 | | Background_Matting | 1 | 15.4071 | 15.3701 | 10.1761 | 10.3708 | | hf_DistilBert | 16 | 13.5553 | 13.5681 | 9.912 | 10.0293 | | resnet50 | 64 | 13.8593 | 13.8984 | 9.1983 | 9.3518 | | attention_is_all_you_need_pytorch | 256 | 12.1419 | 13.5304 | 9.1797 | 9.4978 | | squeezenet1_1 | 256 | 14.6455 | 14.693 | 8.9845 | 9.0459 | | hf_Bert | 8 | 12.4528 | 12.5026 | 8.7468 | 9.027 | | mnasnet1_0 | 128 | 12.1252 | 12.1025 | 8.5072 | 8.5835 | | mobilenet_v2 | 128 | 12.2137 | 12.2566 | 8.343 | 8.3764 | | mobilenet_v3_large | 128 | 10.926 | 10.9051 | 7.9383 | 8.0026 | | yolov3 | 8 | 10.314 | 10.3036 | 7.6725 | 7.8641 | | BERT_pytorch | 32 | 9.5261 | 11.3595 | 6.1571 | 6.3861 | | tts_angular | 512 | 7.1233 | 5.8138 | 5.9249 | 5.821 | | shufflenet_v2_x1_0 | 128 | 7.3649 | 7.4131 | 5.854 | 4.9993 | | doctr_reco_predictor | 64 | 5.5366 | 5.1483 | 5.1269 | 6.0113 | | LearningToPaint | 256 | 6.4989 | 6.5063 | 5.0653 | 5.9589 | | nvidia_deeprecommender | 512 | 4.7455 | 4.7493 | 5.0089 | 4.7556 | | pytorch_stargan | 16 | 4.3417 | 5.3585 | 4.6248 | 4.6311 | | dcgan | 1024 | 3.8799 | 3.8774 | 3.242 | 3.2609 | | vgg16 | 8 | 3.4816 | 3.9838 | 3.0407 | 2.7381 | | phlippe_densenet | 128 | 5.6171 | 6.038 | 2.4102 | 3.579 | | functorch_dp_cifar10 | 512 | 2.8429 | 2.8502 | 2.0912 | 2.2357 | | pytorch_CycleGAN_and_pix2pix | 1 | 3.2591 | 3.4697 | 2.0854 | 2.3121 | | phlippe_resnet | 256 | 2.1593 | 2.3208 | 1.5064 | 1.6054 | | dlrm | 1 | 0.6855 | 0.6658 | 0.388 | 0.6445 | | drq | 1 | 0.6266 | 0.6671 | 0.2122 | 0.5915 | | lennard_jones | 1000 | 0.2345 | 0.2477 | 0.1129 | 0.2724 | | soft_actor_critic | 256 | 0.3069 | 0.327 | 0.1116 | 0.3263 | | hf_BigBird | 4 | 121.7511 | 125.0117 | nan | 96.4091 | | tacotron2 | 128 | 542.0161 | nan | nan | nan | | moco | 64 | 30.1084 | nan | nan | nan | | detectron2_fcos_r_50_fpn | 0 | nan | nan | nan | nan | +-----------------------------------+------+----------+-----------+----------+------------------------+ ~~~

huggingface suite with float16 precision

see more Performance speedup ~~~ +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | MobileBertForMaskedLM | 64 | 1.0948 | 0.9093 | 2.8298 | 1.5823 | | MobileBertForQuestionAnswering | 128 | 1.1262 | 0.9015 | 2.6079 | 1.6467 | | MT5ForConditionalGeneration | 16 | 0.9328 | 0.8313 | 2.4934 | 1.4029 | | XLNetLMHeadModel | 8 | 1.0001 | 0.9975 | 2.3547 | 2.3542 | | T5ForConditionalGeneration | 4 | 0.9591 | 0.901 | 2.0885 | 1.8403 | | T5Small | 4 | 0.9659 | 0.9058 | 2.081 | 1.8248 | | GPT2ForSequenceClassification | 4 | 1.0017 | 0.9986 | 1.9098 | 1.8736 | | XGLMForCausalLM | 8 | 1.0555 | 0.9258 | 1.7803 | 1.3653 | | ElectraForCausalLM | 32 | 1.006 | 1.0031 | 1.7753 | 1.7259 | | ElectraForQuestionAnswering | 64 | 1.004 | 0.9995 | 1.7058 | 1.6696 | | AllenaiLongformerBase | 4 | 1.0004 | 0.9864 | 1.5803 | 1.5308 | | DistillGPT2 | 16 | 0.9995 | 0.9991 | 1.5552 | 1.5479 | | LayoutLMForSequenceClassification | 16 | 1.0028 | 0.9998 | 1.5278 | 1.5068 | | YituTechConvBert | 16 | 1.0025 | 0.9986 | 1.5086 | 1.4756 | | RobertaForQuestionAnswering | 16 | 1.0029 | 1.0006 | 1.5004 | 1.4693 | | M2M100ForConditionalGeneration | 16 | 0.9991 | 0.8859 | 1.5003 | 1.2569 | | DebertaForMaskedLM | 4 | 0.792 | 0.6888 | 1.4929 | 1.0498 | | BertForQuestionAnswering | 16 | 1.0024 | 1.0001 | 1.4925 | 1.4668 | | RobertaForCausalLM | 16 | 1.0024 | 0.9999 | 1.4806 | 1.4575 | | OPTForCausalLM | 2 | 0.9992 | 0.993 | 1.4629 | 1.4843 | | LayoutLMForMaskedLM | 16 | 1.0026 | 1.002 | 1.4487 | 1.4302 | | Speech2Text2ForCausalLM | 256 | 0.9922 | 0.9822 | 1.4212 | 1.4065 | | DebertaV2ForMaskedLM | 1 | 0.7365 | 0.6416 | 1.4185 | 0.7621 | | BertForMaskedLM | 16 | 1.0044 | 1.0016 | 1.4169 | 1.3981 | | AlbertForQuestionAnswering | 4 | 1.0008 | 1.0009 | 1.3807 | 1.3732 | | AlbertForMaskedLM | 4 | 1.0006 | 1.0006 | 1.3794 | 1.3725 | | MegatronBertForCausalLM | 4 | 1.0086 | 0.9058 | 1.3758 | 1.3594 | | MegatronBertForQuestionAnswering | 8 | 1.0008 | 1.0081 | 1.3747 | 1.3662 | | CamemBert | 16 | 1.0024 | 0.9998 | 1.3547 | 1.3383 | | DistilBertForQuestionAnswering | 256 | 1.0003 | 0.9998 | 1.3375 | 1.3335 | | BartForCausalLM | 4 | 1.0025 | 0.9987 | 1.2778 | 1.2721 | | DistilBertForMaskedLM | 128 | 0.9997 | 0.999 | 1.257 | 1.2502 | | MBartForCausalLM | 4 | 1.0043 | 0.9942 | 1.252 | 1.2539 | | BlenderbotSmallForCausalLM | 64 | 1.0046 | 0.977 | 1.2429 | 1.2457 | | PLBartForCausalLM | 8 | 1.0019 | 0.9951 | 1.2335 | 1.2315 | | BlenderbotForCausalLM | 4 | 1.0361 | 0.9169 | 1.2177 | 1.2261 | | MBartForConditionalGeneration | 2 | 0.9779 | 0.9442 | 1.2007 | 1.1414 | | DebertaForQuestionAnswering | 8 | 0.9836 | 0.8492 | 1.1969 | 1.3328 | | BartForConditionalGeneration | 2 | 0.9743 | 0.9347 | 1.1968 | 1.141 | | TrOCRForCausalLM | 32 | 1.0011 | 0.9994 | 1.1755 | 1.1728 | | PLBartForConditionalGeneration | 4 | 0.9801 | 0.9573 | 1.1717 | 1.1281 | | BlenderbotSmallForConditionalGeneration | 64 | 0.9919 | 0.9437 | 1.1677 | 1.1157 | | PegasusForCausalLM | 32 | 0.9972 | 0.9876 | 1.1411 | 1.1331 | | PegasusForConditionalGeneration | 32 | 0.982 | 0.8866 | 1.1239 | 1.0803 | | DebertaV2ForQuestionAnswering | 2 | 0.7328 | 0.6343 | 0.958 | 0.7435 | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Accuracy ~~~ +-----------------------------------------+----+------------------+------------------+------------------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+----+------------------+------------------+------------------+------------------------+ | BlenderbotForCausalLM | 1 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | DebertaV2ForMaskedLM | 1 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | | XLNetLMHeadModel | 1 | pass | pass | pass | pass | | PLBartForCausalLM | 1 | pass | pass | pass | pass | | MBartForCausalLM | 1 | pass | pass | pass | pass | | MBartForConditionalGeneration | 1 | pass | pass | pass | pass | | MT5ForConditionalGeneration | 1 | pass | pass | pass | pass | | MegatronBertForCausalLM | 1 | pass | pass | pass | pass | | MegatronBertForQuestionAnswering | 1 | pass | pass | pass | pass | | MobileBertForMaskedLM | 1 | pass | pass | pass | pass | | MobileBertForQuestionAnswering | 1 | pass | pass | pass | pass | | OPTForCausalLM | 1 | pass | pass | pass | pass | | PLBartForConditionalGeneration | 1 | pass | pass | pass | pass | | XGLMForCausalLM | 1 | pass | pass | pass | pass | | M2M100ForConditionalGeneration | 1 | pass | pass | pass | pass | | PegasusForConditionalGeneration | 1 | pass | pass | pass | pass | | RobertaForCausalLM | 1 | pass | pass | pass | pass | | RobertaForQuestionAnswering | 1 | pass | pass | pass | pass | | Speech2Text2ForCausalLM | 1 | pass | pass | pass | pass | | T5ForConditionalGeneration | 1 | pass | pass | pass | pass | | T5Small | 1 | pass | pass | pass | pass | | TrOCRForCausalLM | 1 | pass | pass | pass | pass | | PegasusForCausalLM | 1 | pass | pass | pass | pass | | LayoutLMForSequenceClassification | 1 | pass | pass | pass | pass | | LayoutLMForMaskedLM | 1 | pass | pass | pass | pass | | BlenderbotSmallForConditionalGeneration | 1 | pass | pass | pass | pass | | AlbertForMaskedLM | 1 | pass | pass | pass | pass | | AlbertForQuestionAnswering | 1 | pass | pass | pass | pass | | AllenaiLongformerBase | 1 | pass | pass | pass | pass | | BartForCausalLM | 1 | pass | pass | pass | pass | | BartForConditionalGeneration | 1 | pass | pass | pass | pass | | BertForMaskedLM | 1 | pass | pass | pass | pass | | BertForQuestionAnswering | 1 | pass | pass | pass | pass | | BlenderbotSmallForCausalLM | 1 | pass | pass | pass | pass | | CamemBert | 1 | pass | pass | pass | pass | | GPT2ForSequenceClassification | 1 | pass | pass | pass | pass | | DebertaForMaskedLM | 1 | pass | pass | pass | pass | | DebertaForQuestionAnswering | 1 | pass | pass | pass | pass | | DebertaV2ForQuestionAnswering | 1 | pass | pass | pass | pass | | DistilBertForMaskedLM | 1 | pass | pass | pass | pass | | DistilBertForQuestionAnswering | 1 | pass | pass | pass | pass | | DistillGPT2 | 1 | pass | pass | pass | pass | | ElectraForCausalLM | 1 | pass | pass | pass | pass | | ElectraForQuestionAnswering | 1 | pass | pass | pass | pass | | YituTechConvBert | 1 | pass | pass | pass | pass | +-----------------------------------------+----+------------------+------------------+------------------+------------------------+ ~~~ Compilation latency (sec) ~~~ +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | DebertaV2ForQuestionAnswering | 2 | 7.355 | 11.8604 | 45.0586 | 23.7609 | | DebertaV2ForMaskedLM | 1 | 7.3746 | 12.1264 | 43.6145 | 22.3755 | | AllenaiLongformerBase | 4 | 4.7357 | 9.5649 | 38.3529 | 38.0592 | | DebertaForMaskedLM | 4 | 4.1852 | 6.9707 | 31.0316 | 17.2934 | | DebertaForQuestionAnswering | 8 | 4.2085 | 6.8956 | 30.0144 | 16.6051 | | M2M100ForConditionalGeneration | 16 | 4.2349 | 8.6677 | 26.8993 | 24.8252 | | PegasusForConditionalGeneration | 32 | 3.9796 | 8.8897 | 26.2081 | 23.8287 | | MBartForConditionalGeneration | 2 | 4.5279 | 8.964 | 25.7123 | 23.6768 | | BartForConditionalGeneration | 2 | 4.2739 | 8.741 | 24.8725 | 23.1577 | | MobileBertForQuestionAnswering | 128 | 8.1733 | 15.4154 | 22.2011 | 21.1531 | | MobileBertForMaskedLM | 64 | 8.0651 | 15.3842 | 21.5022 | 20.9862 | | XLNetLMHeadModel | 8 | 4.2748 | 8.9965 | 20.9974 | 21.0643 | | XGLMForCausalLM | 8 | 3.0455 | 6.4681 | 20.9292 | 20.1259 | | MT5ForConditionalGeneration | 16 | 4.8859 | 7.7797 | 20.912 | 15.2587 | | BlenderbotForCausalLM | 4 | 2.9578 | 6.2878 | 19.8217 | 23.9034 | | BlenderbotSmallForConditionalGeneration | 64 | 3.0504 | 6.0726 | 19.1035 | 21.8864 | | PLBartForConditionalGeneration | 4 | 2.6957 | 5.0868 | 17.1676 | 16.5181 | | T5ForConditionalGeneration | 4 | 3.733 | 5.6724 | 16.3474 | 12.0891 | | T5Small | 4 | 3.7383 | 5.6538 | 16.3331 | 12.1191 | | YituTechConvBert | 16 | 2.5729 | 5.3726 | 14.6142 | 14.2015 | | PegasusForCausalLM | 32 | 1.6281 | 3.2652 | 14.1206 | 13.8806 | | MegatronBertForCausalLM | 4 | 3.6963 | 7.3304 | 13.1434 | 12.8686 | | MegatronBertForQuestionAnswering | 8 | 3.7195 | 7.0187 | 12.9535 | 12.7135 | | MBartForCausalLM | 4 | 1.5336 | 3.3842 | 12.4774 | 12.3669 | | BartForCausalLM | 4 | 1.4961 | 3.221 | 12.4042 | 12.0877 | | OPTForCausalLM | 2 | 1.5464 | 3.2089 | 12.368 | 12.4574 | | TrOCRForCausalLM | 32 | 1.5652 | 3.1668 | 11.9011 | 11.7614 | | Speech2Text2ForCausalLM | 256 | 0.9235 | 1.7998 | 10.8299 | 10.5092 | | BlenderbotSmallForCausalLM | 64 | 1.0679 | 2.1858 | 10.6104 | 13.1825 | | PLBartForCausalLM | 8 | 0.8847 | 1.7676 | 9.3117 | 9.3755 | | LayoutLMForMaskedLM | 16 | 1.8137 | 3.6135 | 8.947 | 8.8567 | | LayoutLMForSequenceClassification | 16 | 1.8341 | 3.5781 | 8.821 | 8.8302 | | GPT2ForSequenceClassification | 4 | 1.6912 | 3.2496 | 8.698 | 8.5016 | | ElectraForCausalLM | 32 | 1.7943 | 3.5652 | 8.2358 | 7.8918 | | RobertaForCausalLM | 16 | 1.8181 | 3.5764 | 8.087 | 7.8481 | | DistillGPT2 | 16 | 0.9294 | 1.7986 | 7.2733 | 6.8724 | | ElectraForQuestionAnswering | 64 | 1.7844 | 3.554 | 7.0683 | 6.7905 | | CamemBert | 16 | 1.8154 | 3.6533 | 6.993 | 6.8104 | | RobertaForQuestionAnswering | 16 | 1.8198 | 3.5831 | 6.9726 | 6.889 | | BertForQuestionAnswering | 16 | 1.7842 | 3.4884 | 6.9617 | 6.9993 | | BertForMaskedLM | 16 | 1.8297 | 3.4718 | 6.8697 | 6.9908 | | AlbertForQuestionAnswering | 4 | 1.6415 | 3.2972 | 6.4707 | 6.2667 | | AlbertForMaskedLM | 4 | 1.616 | 3.2746 | 6.4673 | 6.2456 | | DistilBertForMaskedLM | 128 | 0.8569 | 1.7727 | 5.6062 | 5.5547 | | DistilBertForQuestionAnswering | 256 | 0.8599 | 1.8054 | 5.5172 | 5.5353 | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Peak Memory Compression Ratio ~~~ +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ | ElectraForCausalLM | 32 | 1.0027 | 1.0016 | 2.3211 | 2.329 | | DistillGPT2 | 16 | 1.0041 | 1.0038 | 1.8394 | 1.8434 | | DistilBertForMaskedLM | 128 | 1.011 | 1.0102 | 1.758 | 1.764 | | MobileBertForQuestionAnswering | 128 | 1.9018 | 1.9086 | 1.7243 | 1.7553 | | BlenderbotSmallForCausalLM | 64 | 1.0041 | 1.0036 | 1.6807 | 1.6807 | | MobileBertForMaskedLM | 64 | 1.0073 | 1.0072 | 1.6677 | 1.6781 | | RobertaForCausalLM | 16 | 1.0064 | 1.0055 | 1.6078 | 1.6135 | | PLBartForCausalLM | 8 | 1.0057 | 1.0053 | 1.5924 | 1.5919 | | OPTForCausalLM | 2 | 1.0031 | 1.0027 | 1.5661 | 1.5661 | | Speech2Text2ForCausalLM | 256 | 0.8791 | 0.8791 | 1.5265 | 1.3818 | | YituTechConvBert | 16 | 1.0087 | 1.0075 | 1.5154 | 1.5222 | | CamemBert | 16 | 1.0082 | 1.0071 | 1.5089 | 1.5153 | | BertForMaskedLM | 16 | 1.0085 | 1.0074 | 1.4987 | 1.5054 | | LayoutLMForMaskedLM | 16 | 1.0084 | 1.0073 | 1.4908 | 1.4982 | | TrOCRForCausalLM | 32 | 1.0063 | 1.0057 | 1.4427 | 1.4427 | | AllenaiLongformerBase | 4 | 0.9606 | 0.9606 | 1.3423 | 1.356 | | BartForCausalLM | 4 | 1.0041 | 1.0037 | 1.2508 | 1.2508 | | MBartForCausalLM | 4 | 1.0041 | 1.0037 | 1.2507 | 1.2507 | | PegasusForCausalLM | 32 | 0.9074 | 0.9074 | 1.2238 | 1.1091 | | XGLMForCausalLM | 8 | 0.9703 | 0.9703 | 1.1722 | 1.1397 | | MegatronBertForCausalLM | 4 | 1.0025 | 1.002 | 1.1593 | 1.1626 | | M2M100ForConditionalGeneration | 16 | 0.9363 | 0.9363 | 1.1101 | 1.1057 | | PegasusForConditionalGeneration | 32 | 0.9933 | 0.9933 | 1.0378 | 1.1903 | | BlenderbotForCausalLM | 4 | 1.0008 | 1.0008 | 0.9892 | 0.9892 | | AlbertForQuestionAnswering | 4 | 1.0896 | 1.0892 | 0.9828 | 0.9862 | | AlbertForMaskedLM | 4 | 1.0894 | 1.0891 | 0.982 | 0.9854 | | MegatronBertForQuestionAnswering | 8 | 1.0339 | 1.0328 | 0.9806 | 0.9833 | | GPT2ForSequenceClassification | 4 | 1.0145 | 1.0145 | 0.9675 | 0.9707 | | LayoutLMForSequenceClassification | 16 | 1.0889 | 1.0866 | 0.9626 | 0.9677 | | BertForQuestionAnswering | 16 | 1.0904 | 1.089 | 0.9625 | 0.9668 | | RobertaForQuestionAnswering | 16 | 1.0914 | 1.089 | 0.9625 | 0.9668 | | PLBartForConditionalGeneration | 4 | 0.9995 | 1.0 | 0.9382 | 0.9995 | | BartForConditionalGeneration | 2 | 1.0 | 1.0 | 0.9305 | 1.0 | | MBartForConditionalGeneration | 2 | 1.0 | 1.0 | 0.9287 | 1.0 | | BlenderbotSmallForConditionalGeneration | 64 | 1.0 | 1.0 | 0.9229 | 1.0 | | ElectraForQuestionAnswering | 64 | 1.2329 | 1.2126 | 0.9196 | 0.9282 | | XLNetLMHeadModel | 8 | 1.0039 | 1.0031 | 0.8995 | 0.8995 | | MT5ForConditionalGeneration | 16 | 0.9684 | 0.9724 | 0.8984 | 0.9713 | | DistilBertForQuestionAnswering | 256 | 1.1392 | 1.1345 | 0.888 | 0.8922 | | DebertaV2ForMaskedLM | 1 | 0.9999 | 0.9999 | 0.5904 | 0.9999 | | DebertaV2ForQuestionAnswering | 2 | 1.0016 | 1.0016 | 0.4181 | 0.9604 | | T5ForConditionalGeneration | 4 | 0.7115 | 0.7067 | 0.3528 | 0.7236 | | T5Small | 4 | 0.7091 | 0.7067 | 0.3528 | 0.7236 | | DebertaForMaskedLM | 4 | 0.9598 | 0.9595 | 0.1833 | 0.9597 | | DebertaForQuestionAnswering | 8 | 0.9364 | 0.9364 | 0.0903 | 0.8793 | +-----------------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Absolute latency (ms) ~~~ +-----------------------------------------+-----+----------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +-----------------------------------------+-----+----------+-----------+----------+------------------------+ | AlbertForMaskedLM | 4 | 66.6866 | 66.6853 | 48.9019 | 49.1294 | | AlbertForQuestionAnswering | 4 | 66.1753 | 66.1346 | 48.5057 | 48.7772 | | XLNetLMHeadModel | 8 | 100.6799 | 101.3405 | 42.849 | 42.7894 | | PegasusForConditionalGeneration | 32 | 38.8852 | 45.9199 | 34.1454 | 35.4769 | | TrOCRForCausalLM | 32 | 38.3302 | 38.9068 | 33.0452 | 33.0718 | | AllenaiLongformerBase | 4 | 45.429 | 45.8325 | 28.9223 | 29.6078 | | MBartForConditionalGeneration | 2 | 35.3323 | 36.6296 | 28.8116 | 30.1772 | | BartForConditionalGeneration | 2 | 34.6497 | 36.0841 | 28.3006 | 29.8082 | | BlenderbotSmallForConditionalGeneration | 64 | 29.5603 | 31.4023 | 25.229 | 26.3977 | | YituTechConvBert | 16 | 36.7578 | 36.8352 | 24.3742 | 24.9204 | | PLBartForCausalLM | 8 | 29.2232 | 29.6329 | 23.7903 | 23.6263 | | MBartForCausalLM | 4 | 28.8614 | 29.4201 | 23.2017 | 23.0512 | | PLBartForConditionalGeneration | 4 | 27.0912 | 28.1038 | 22.9043 | 23.7805 | | MegatronBertForQuestionAnswering | 8 | 31.2524 | 31.1211 | 22.8066 | 22.9694 | | BartForCausalLM | 4 | 28.8338 | 28.9693 | 22.0296 | 22.8171 | | CamemBert | 16 | 28.5455 | 28.6161 | 21.112 | 21.3656 | | OPTForCausalLM | 2 | 30.239 | 30.5512 | 20.8431 | 20.424 | | DistilBertForQuestionAnswering | 256 | 26.8451 | 26.7697 | 20.2794 | 20.3715 | | DebertaV2ForQuestionAnswering | 2 | 24.5212 | 28.2817 | 19.05 | 24.5913 | | M2M100ForConditionalGeneration | 16 | 27.8727 | 31.5438 | 18.9641 | 21.6981 | | DistilBertForMaskedLM | 128 | 22.0993 | 22.1228 | 17.7157 | 17.7825 | | DistillGPT2 | 16 | 27.4605 | 27.5017 | 17.6576 | 17.7394 | | LayoutLMForMaskedLM | 16 | 25.357 | 25.3838 | 17.5678 | 17.7942 | | RobertaForCausalLM | 16 | 25.9143 | 25.9875 | 17.5618 | 17.8027 | | BertForMaskedLM | 16 | 24.7618 | 24.8049 | 17.5285 | 17.7667 | | PegasusForCausalLM | 32 | 19.3553 | 19.6126 | 16.9605 | 17.0453 | | T5ForConditionalGeneration | 4 | 34.6505 | 36.2254 | 15.6464 | 17.7424 | | T5Small | 4 | 33.5697 | 36.1143 | 15.6103 | 17.9002 | | MobileBertForQuestionAnswering | 128 | 26.6153 | 32.4148 | 14.043 | 17.7554 | | ElectraForQuestionAnswering | 64 | 23.7991 | 23.9076 | 13.9942 | 14.2675 | | LayoutLMForSequenceClassification | 16 | 21.259 | 21.3363 | 13.9464 | 14.1574 | | RobertaForQuestionAnswering | 16 | 20.8533 | 20.9167 | 13.9423 | 14.2251 | | BertForQuestionAnswering | 16 | 20.7119 | 20.7791 | 13.9331 | 14.174 | | MegatronBertForCausalLM | 4 | 18.9575 | 23.261 | 13.8565 | 14.0195 | | DebertaForQuestionAnswering | 8 | 16.7141 | 19.3215 | 13.754 | 12.3008 | | DebertaV2ForMaskedLM | 1 | 25.5792 | 29.4933 | 13.7348 | 25.7068 | | BlenderbotSmallForCausalLM | 64 | 16.6079 | 17.0667 | 13.4246 | 13.3189 | | BlenderbotForCausalLM | 4 | 15.607 | 17.7333 | 13.151 | 13.265 | | MT5ForConditionalGeneration | 16 | 34.196 | 38.694 | 12.6952 | 23.118 | | MobileBertForMaskedLM | 64 | 32.1774 | 37.8092 | 12.1581 | 21.6459 | | XGLMForCausalLM | 8 | 19.3337 | 21.9978 | 11.0211 | 14.8551 | | ElectraForCausalLM | 32 | 19.065 | 19.1811 | 10.8167 | 11.1041 | | DebertaForMaskedLM | 4 | 19.6657 | 22.1604 | 10.6533 | 14.823 | | Speech2Text2ForCausalLM | 256 | 14.9367 | 15.2599 | 10.525 | 10.637 | | GPT2ForSequenceClassification | 4 | 19.0671 | 19.1576 | 10.0361 | 10.204 | +-----------------------------------------+-----+----------+-----------+----------+------------------------+ ~~~

timm_models suite with float16 precision

see more Performance speedup ~~~ +---------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+--------+-----------+----------+------------------------+ | resmlp_12_224 | 128 | 0.9997 | 1.0108 | 1.9766 | 1.969 | | tnt_s_patch16_224 | 128 | 0.9999 | 1.0 | 1.89 | 1.8676 | | dm_nfnet_f0 | 128 | 0.9997 | 0.9995 | 1.8494 | 1.7848 | | regnety_002 | 128 | 0.9895 | 0.9944 | 1.756 | 1.3655 | | coat_lite_mini | 128 | 0.9996 | 0.9995 | 1.7503 | 1.7197 | | nfnet_l0 | 128 | 0.9995 | 0.9996 | 1.7382 | 1.7212 | | convnext_base | 64 | 0.9991 | 1.0115 | 1.6174 | 1.5931 | | sebotnet33ts_256 | 64 | 0.999 | 0.9995 | 1.6119 | 1.5883 | | cait_m36_384 | 4 | 0.9971 | 1.0071 | 1.5977 | 1.552 | | volo_d1_224 | 64 | 0.9996 | 1.0027 | 1.5974 | 1.5801 | | resnest101e | 64 | 1.0004 | 1.0 | 1.5963 | 1.5001 | | poolformer_m36 | 64 | 1.0014 | 1.002 | 1.5791 | 1.5413 | | res2net50_14w_8s | 128 | 0.9989 | 1.0003 | 1.5064 | 1.4746 | | tinynet_a | 128 | 0.9982 | 0.9989 | 1.4939 | 1.4701 | | botnet26t_256 | 128 | 0.9992 | 0.9998 | 1.4884 | 1.4794 | | gmixer_24_224 | 128 | 0.9999 | 1.0058 | 1.482 | 1.4758 | | res2next50 | 128 | 0.9988 | 0.9973 | 1.481 | 1.3969 | | res2net101_26w_4s | 64 | 0.9991 | 0.9998 | 1.4793 | 1.4394 | | tf_efficientnet_b0 | 128 | 0.999 | 0.9986 | 1.4783 | 1.4648 | | gmlp_s16_224 | 128 | 0.9994 | 1.0057 | 1.4733 | 1.4668 | | gernet_l | 128 | 0.9994 | 1.0002 | 1.4691 | 1.4533 | | mobilenetv2_100 | 128 | 0.9977 | 0.9991 | 1.4641 | 1.4507 | | ese_vovnet19b_dw | 128 | 0.9988 | 0.9994 | 1.4524 | 1.4468 | | eca_botnext26ts_256 | 128 | 0.9993 | 0.9995 | 1.4504 | 1.4266 | | dla102 | 128 | 0.9999 | 1.0004 | 1.4504 | 1.4379 | | dpn107 | 32 | 0.9991 | 1.0 | 1.4455 | 1.411 | | ghostnet_100 | 128 | 0.9977 | 0.9993 | 1.4408 | 1.4229 | | eca_halonext26ts | 128 | 0.9993 | 0.9996 | 1.4369 | 1.4133 | | swsl_resnext101_32x16d | 32 | 0.9993 | 1.0001 | 1.4332 | 1.3239 | | cspdarknet53 | 64 | 1.0 | 1.0 | 1.4275 | 1.4066 | | spnasnet_100 | 128 | 0.9981 | 0.9993 | 1.4122 | 1.3982 | | mnasnet_100 | 128 | 0.9978 | 0.9996 | 1.4112 | 1.3988 | | fbnetv3_b | 128 | 0.9981 | 0.9996 | 1.4112 | 1.3954 | | rexnet_100 | 128 | 0.9982 | 0.9988 | 1.395 | 1.3692 | | fbnetc_100 | 128 | 0.9981 | 0.9994 | 1.3883 | 1.3755 | | inception_v3 | 128 | 0.9991 | 0.9995 | 1.3859 | 1.3712 | | gluon_inception_v3 | 128 | 0.9987 | 0.9996 | 1.3848 | 1.3713 | | adv_inception_v3 | 128 | 0.999 | 0.9996 | 1.3844 | 1.3724 | | mobilenetv3_large_100 | 128 | 0.9973 | 0.9982 | 1.372 | 1.3547 | | swin_base_patch4_window7_224 | 64 | 0.9998 | 1.0001 | 1.3473 | 1.3291 | | tf_mixnet_l | 128 | 0.9994 | 0.9995 | 1.3462 | 1.3219 | | pnasnet5large | 16 | 1.0003 | 1.0016 | 1.3322 | 1.3096 | | jx_nest_base | 32 | 0.9999 | 1.0048 | 1.3313 | 1.3082 | | repvgg_a2 | 128 | 0.9997 | 1.0006 | 1.3294 | 1.3199 | | selecsls42b | 128 | 0.9982 | 0.9989 | 1.3226 | 1.3092 | | mobilevit_s | 64 | 0.9994 | 1.0005 | 1.3142 | 1.2952 | | hrnet_w18 | 128 | 1.0009 | 1.0011 | 1.3124 | 1.2677 | | mixnet_l | 128 | 0.9988 | 0.9996 | 1.2962 | 1.2714 | | gluon_xception65 | 32 | 0.9989 | 0.9994 | 1.2945 | 1.2779 | | lcnet_050 | 128 | 0.9927 | 0.9945 | 1.281 | 1.2667 | | twins_pcpvt_base | 64 | 1.0054 | 1.0057 | 1.2503 | 1.2232 | | crossvit_9_240 | 128 | 1.0016 | 1.0018 | 1.2388 | 1.2059 | | convmixer_768_32 | 32 | 0.9996 | 0.9999 | 1.1892 | 1.1879 | | mixer_b16_224 | 128 | 0.9999 | 1.0055 | 1.1817 | 1.1776 | | pit_b_224 | 64 | 0.9997 | 0.9999 | 1.1274 | 1.1198 | | deit_base_distilled_patch16_224 | 64 | 0.9997 | 1.0001 | 1.1241 | 1.1143 | | visformer_small | 128 | 0.999 | 0.9992 | 1.1077 | 1.0905 | | beit_base_patch16_224 | 64 | 0.9995 | 0.9999 | 1.097 | 1.1039 | | vit_base_patch16_224 | 64 | 0.9998 | 0.9999 | 1.0813 | 1.0747 | +---------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Accuracy ~~~ +---------------------------------+----+-------+-----------+---------------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+----+-------+-----------+---------------+------------------------+ | adv_inception_v3 | 8 | pass | pass | pass | pass | | beit_base_patch16_224 | 8 | pass | pass | pass | pass | | nfnet_l0 | 8 | pass | pass | pass | pass | | pit_b_224 | 8 | pass | pass | pass | pass | | pnasnet5large | 8 | pass | pass | pass | pass | | poolformer_m36 | 8 | pass | pass | pass | pass | | regnety_002 | 8 | pass | pass | pass | pass | | repvgg_a2 | 8 | pass | pass | pass | pass | | res2net101_26w_4s | 8 | pass | pass | pass | pass | | res2net50_14w_8s | 8 | pass | pass | pass | pass | | res2next50 | 8 | pass | pass | pass | pass | | resmlp_12_224 | 8 | pass | pass | pass | pass | | resnest101e | 8 | pass | pass | pass | pass | | rexnet_100 | 8 | pass | pass | pass | pass | | sebotnet33ts_256 | 8 | pass | pass | pass | pass | | selecsls42b | 8 | pass | pass | pass | pass | | spnasnet_100 | 8 | pass | pass | pass | pass | | swin_base_patch4_window7_224 | 8 | pass | pass | pass | pass | | swsl_resnext101_32x16d | 8 | pass | pass | pass | pass | | tf_efficientnet_b0 | 8 | pass | pass | pass | pass | | tf_mixnet_l | 8 | pass | pass | pass | pass | | tinynet_a | 8 | pass | pass | pass | pass | | tnt_s_patch16_224 | 8 | pass | pass | pass | pass | | twins_pcpvt_base | 8 | pass | pass | pass | pass | | visformer_small | 8 | pass | pass | pass | pass | | vit_base_patch16_224 | 8 | pass | pass | pass | pass | | volo_d1_224 | 8 | pass | pass | pass | pass | | mobilevit_s | 8 | pass | pass | pass | pass | | mobilenetv3_large_100 | 8 | pass | pass | pass | pass | | mobilenetv2_100 | 8 | pass | pass | pass | pass | | fbnetc_100 | 8 | pass | pass | pass | pass | | botnet26t_256 | 8 | pass | pass | pass | pass | | coat_lite_mini | 8 | pass | pass | pass | pass | | convmixer_768_32 | 8 | pass | pass | pass | pass | | crossvit_9_240 | 8 | pass | pass | pass | pass | | cspdarknet53 | 8 | pass | pass | pass | pass | | deit_base_distilled_patch16_224 | 8 | pass | pass | pass | pass | | dla102 | 8 | pass | pass | pass | pass | | dm_nfnet_f0 | 8 | pass | pass | pass | pass | | dpn107 | 8 | pass | pass | pass | pass | | eca_botnext26ts_256 | 8 | pass | pass | pass | pass | | eca_halonext26ts | 8 | pass | pass | pass | pass | | mnasnet_100 | 8 | pass | pass | pass | pass | | ese_vovnet19b_dw | 8 | pass | pass | pass | pass | | fbnetv3_b | 8 | pass | pass | pass | pass | | gernet_l | 8 | pass | pass | pass | pass | | gluon_inception_v3 | 8 | pass | pass | pass | pass | | gluon_xception65 | 8 | pass | pass | pass | pass | | gmixer_24_224 | 8 | pass | pass | pass | pass | | gmlp_s16_224 | 8 | pass | pass | pass | pass | | hrnet_w18 | 8 | pass | pass | pass | pass | | inception_v3 | 8 | pass | pass | pass | pass | | jx_nest_base | 8 | pass | pass | pass | pass | | lcnet_050 | 8 | pass | pass | pass | pass | | mixer_b16_224 | 8 | pass | pass | pass | pass | | mixnet_l | 8 | pass | pass | pass | pass | | cait_m36_384 | 4 | pass | pass | fail_accuracy | fail_accuracy | | convnext_base | 8 | pass | pass | fail_accuracy | fail_accuracy | | ghostnet_100 | 8 | pass | pass | fail_accuracy | fail_accuracy | +---------------------------------+----+-------+-----------+---------------+------------------------+ ~~~ Compilation latency (sec) ~~~ +---------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+--------+-----------+----------+------------------------+ | mobilevit_s | 64 | 1.6799 | 3.6168 | 52.8307 | 52.3107 | | twins_pcpvt_base | 64 | 2.9139 | 6.7336 | 44.1319 | 44.2555 | | coat_lite_mini | 128 | 1.1726 | 2.5216 | 32.2498 | 32.6697 | | swin_base_patch4_window7_224 | 64 | 2.8827 | 6.2327 | 30.9471 | 30.7917 | | hrnet_w18 | 128 | 5.814 | 14.3401 | 28.3875 | 27.8485 | | pnasnet5large | 16 | 4.662 | 10.6008 | 24.9895 | 24.2476 | | resnest101e | 64 | 3.3394 | 7.7367 | 24.6733 | 22.766 | | cait_m36_384 | 4 | 3.6466 | 8.9265 | 23.2248 | 22.8576 | | convnext_base | 64 | 1.4013 | 2.7002 | 22.6749 | 22.414 | | jx_nest_base | 32 | 1.7164 | 3.7374 | 21.3238 | 21.4028 | | eca_halonext26ts | 128 | 1.2546 | 2.4489 | 17.6639 | 17.3687 | | poolformer_m36 | 64 | 1.6736 | 3.0995 | 16.9444 | 16.4802 | | res2net101_26w_4s | 64 | 3.0594 | 8.0003 | 15.9682 | 15.837 | | tnt_s_patch16_224 | 128 | 2.0956 | 5.1344 | 15.1374 | 15.1584 | | volo_d1_224 | 64 | 1.4551 | 3.5476 | 14.7528 | 14.5393 | | res2net50_14w_8s | 128 | 2.7393 | 7.2575 | 14.63 | 14.4442 | | botnet26t_256 | 128 | 1.1127 | 2.148 | 14.5352 | 14.2127 | | sebotnet33ts_256 | 64 | 1.5066 | 2.9793 | 14.2524 | 14.0722 | | dpn107 | 32 | 3.2041 | 6.4577 | 13.1818 | 13.1523 | | gmlp_s16_224 | 128 | 1.3661 | 3.2155 | 12.6288 | 12.6542 | | crossvit_9_240 | 128 | 1.947 | 4.3391 | 12.6238 | 12.6771 | | gmixer_24_224 | 128 | 1.5998 | 3.7675 | 12.068 | 12.2187 | | fbnetv3_b | 128 | 2.6724 | 5.7195 | 11.727 | 11.5713 | | eca_botnext26ts_256 | 128 | 1.1818 | 2.3511 | 11.2148 | 10.821 | | tf_mixnet_l | 128 | 2.9022 | 5.3671 | 11.0378 | 10.8641 | | gluon_xception65 | 32 | 2.0014 | 5.3019 | 10.3698 | 10.0787 | | mixnet_l | 128 | 2.6798 | 4.9943 | 10.3078 | 10.0614 | | dm_nfnet_f0 | 128 | 2.1918 | 4.0296 | 9.5938 | 9.4696 | | beit_base_patch16_224 | 64 | 1.2945 | 2.791 | 9.4545 | 9.1465 | | dla102 | 128 | 1.7585 | 4.5966 | 9.4477 | 9.3291 | | inception_v3 | 128 | 1.5588 | 4.0012 | 9.2181 | 8.9852 | | swsl_resnext101_32x16d | 32 | 1.7086 | 4.4762 | 9.15 | 8.9181 | | gluon_inception_v3 | 128 | 1.5651 | 3.9212 | 9.0423 | 8.9607 | | adv_inception_v3 | 128 | 1.5442 | 4.0018 | 9.0358 | 9.0158 | | res2next50 | 128 | 1.5438 | 3.9815 | 8.9507 | 8.6775 | | nfnet_l0 | 128 | 1.9949 | 3.8685 | 8.823 | 8.6087 | | ghostnet_100 | 128 | 1.5237 | 3.8603 | 8.5616 | 8.4536 | | resmlp_12_224 | 128 | 0.6902 | 1.3944 | 8.3372 | 8.1746 | | rexnet_100 | 128 | 1.6772 | 3.4909 | 8.2909 | 7.9989 | | tinynet_a | 128 | 1.8515 | 3.6163 | 8.1476 | 7.8964 | | pit_b_224 | 64 | 1.1889 | 2.574 | 8.043 | 7.8682 | | cspdarknet53 | 64 | 1.8665 | 3.6881 | 8.0148 | 7.8448 | | mixer_b16_224 | 128 | 0.7728 | 1.6757 | 7.8537 | 7.581 | | tf_efficientnet_b0 | 128 | 1.579 | 3.1523 | 7.4242 | 7.2379 | | vit_base_patch16_224 | 64 | 1.0025 | 2.3278 | 7.1656 | 6.9113 | | deit_base_distilled_patch16_224 | 64 | 1.0353 | 2.2952 | 7.1409 | 6.9869 | | spnasnet_100 | 128 | 1.6291 | 3.1654 | 6.4189 | 6.2008 | | fbnetc_100 | 128 | 1.6485 | 3.2314 | 6.4166 | 6.2539 | | mobilenetv3_large_100 | 128 | 1.4201 | 2.777 | 6.3721 | 6.1577 | | mobilenetv2_100 | 128 | 1.3782 | 2.7123 | 6.0471 | 5.8028 | | repvgg_a2 | 128 | 1.6064 | 3.0998 | 6.0361 | 5.769 | | gernet_l | 128 | 1.5886 | 3.0408 | 5.829 | 5.7272 | | convmixer_768_32 | 32 | 1.1418 | 2.8932 | 5.6791 | 5.5341 | | regnety_002 | 128 | 1.3504 | 2.6303 | 5.6094 | 5.3225 | | mnasnet_100 | 128 | 1.3254 | 2.6231 | 5.4702 | 5.2641 | | visformer_small | 128 | 0.9127 | 2.0465 | 5.3921 | 5.1751 | | selecsls42b | 128 | 0.7206 | 1.7567 | 4.3714 | 4.2355 | | lcnet_050 | 128 | 0.8226 | 1.7444 | 4.3004 | 4.0886 | | ese_vovnet19b_dw | 128 | 0.7879 | 1.4679 | 4.1267 | 4.0908 | +---------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Peak Memory Compression Ratio ~~~ +---------------------------------+-----+--------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+--------+-----------+----------+------------------------+ | mobilenetv2_100 | 128 | 1.2161 | 1.2161 | 1.6527 | 1.7992 | | mnasnet_100 | 128 | 1.3757 | 1.3757 | 1.6163 | 1.8416 | | spnasnet_100 | 128 | 1.3755 | 1.3755 | 1.6159 | 1.8411 | | tinynet_a | 128 | 1.2089 | 1.2089 | 1.6151 | 1.7608 | | fbnetc_100 | 128 | 1.1454 | 1.1454 | 1.5494 | 1.6844 | | mobilenetv3_large_100 | 128 | 1.2038 | 1.2038 | 1.5323 | 1.7135 | | fbnetv3_b | 128 | 1.1981 | 1.198 | 1.5288 | 1.7037 | | rexnet_100 | 128 | 1.2146 | 1.2146 | 1.5136 | 1.6349 | | convmixer_768_32 | 32 | 1.1906 | 1.1889 | 1.3816 | 1.471 | | selecsls42b | 128 | 1.5939 | 1.5938 | 1.3475 | 1.4479 | | pnasnet5large | 16 | 1.4292 | 0.6091 | 1.3105 | 1.3352 | | sebotnet33ts_256 | 64 | 1.1862 | 1.1862 | 1.2693 | 1.2914 | | lcnet_050 | 128 | 1.5357 | 1.5357 | 1.2635 | 1.5436 | | gluon_xception65 | 32 | 1.2784 | 1.2784 | 1.2515 | 1.2784 | | tf_efficientnet_b0 | 128 | 1.3183 | 1.3183 | 1.2451 | 1.3183 | | cspdarknet53 | 64 | 1.4123 | 1.4124 | 1.2097 | 1.2417 | | dm_nfnet_f0 | 128 | 1.1526 | 1.345 | 1.2035 | 1.2411 | | ese_vovnet19b_dw | 128 | 1.2901 | 1.2901 | 1.1991 | 1.2388 | | hrnet_w18 | 128 | 1.0653 | 1.0652 | 1.1499 | 1.2233 | | res2next50 | 128 | 1.2855 | 1.0464 | 1.1203 | 1.1687 | | mixnet_l | 128 | 1.1529 | 1.1529 | 1.1174 | 1.1529 | | tf_mixnet_l | 128 | 1.1529 | 1.1529 | 1.1174 | 1.1529 | | res2net50_14w_8s | 128 | 1.181 | 0.9718 | 1.1167 | 1.166 | | coat_lite_mini | 128 | 1.1047 | 1.0864 | 1.0739 | 1.1263 | | res2net101_26w_4s | 64 | 1.178 | 1.0025 | 1.0688 | 1.1088 | | nfnet_l0 | 128 | 1.3463 | 1.3463 | 1.0537 | 1.0933 | | regnety_002 | 128 | 1.0 | 1.0 | 1.0448 | 1.1967 | | ghostnet_100 | 128 | 1.1127 | 1.1127 | 1.0191 | 1.1127 | | poolformer_m36 | 64 | 1.164 | 1.1639 | 1.0187 | 1.0493 | | eca_halonext26ts | 128 | 1.0616 | 0.8517 | 1.011 | 1.0616 | | eca_botnext26ts_256 | 128 | 1.0617 | 0.8515 | 1.011 | 1.0617 | | botnet26t_256 | 128 | 1.0611 | 0.8526 | 1.0109 | 1.0611 | | swsl_resnext101_32x16d | 32 | 1.0 | 0.9642 | 0.9895 | 0.9994 | | repvgg_a2 | 128 | 1.0306 | 1.0306 | 0.9803 | 1.0301 | | resnest101e | 64 | 1.0 | 0.9998 | 0.973 | 1.0 | | dla102 | 128 | 1.0 | 0.8898 | 0.9642 | 1.0 | | adv_inception_v3 | 128 | 1.0003 | 0.9994 | 0.9469 | 1.0003 | | gluon_inception_v3 | 128 | 1.0003 | 0.9994 | 0.9469 | 1.0003 | | inception_v3 | 128 | 1.0003 | 0.9994 | 0.9469 | 1.0003 | | gernet_l | 128 | 1.0 | 1.0 | 0.943 | 1.0 | | convnext_base | 64 | 1.1206 | 1.1123 | 0.9288 | 0.9503 | | dpn107 | 32 | 1.1468 | 0.9941 | 0.8963 | 0.9046 | | jx_nest_base | 32 | 1.1072 | 1.1034 | 0.8724 | 0.8863 | | mobilevit_s | 64 | 1.1638 | 1.1638 | 0.8648 | 0.8966 | | pit_b_224 | 64 | 1.0666 | 1.0648 | 0.8611 | 0.8727 | | visformer_small | 128 | 1.1189 | 1.1188 | 0.8583 | 0.904 | | volo_d1_224 | 64 | 1.0 | 1.0 | 0.8566 | 0.8885 | | cait_m36_384 | 4 | 1.0083 | 1.0072 | 0.8561 | 0.8605 | | twins_pcpvt_base | 64 | 1.0783 | 1.06 | 0.8096 | 0.832 | | deit_base_distilled_patch16_224 | 64 | 1.064 | 1.0616 | 0.801 | 0.8265 | | vit_base_patch16_224 | 64 | 1.0644 | 1.0603 | 0.7999 | 0.8244 | | mixer_b16_224 | 128 | 1.1719 | 1.1635 | 0.7545 | 0.7866 | | swin_base_patch4_window7_224 | 64 | 1.3608 | 1.3483 | 0.7307 | 0.741 | | beit_base_patch16_224 | 64 | 1.0635 | 1.0595 | 0.7001 | 0.7177 | | crossvit_9_240 | 128 | 1.0505 | 1.0392 | 0.6791 | 0.7225 | | resmlp_12_224 | 128 | 1.1803 | 1.1803 | 0.6126 | 0.6569 | | gmixer_24_224 | 128 | 1.1623 | 1.1317 | 0.553 | 0.5864 | | gmlp_s16_224 | 128 | 1.0786 | 1.0409 | 0.5313 | 0.5679 | | tnt_s_patch16_224 | 128 | 1.2112 | 0.9202 | 0.5077 | 0.5345 | +---------------------------------+-----+--------+-----------+----------+------------------------+ ~~~ Absolute latency (ms) ~~~ +---------------------------------+-----+---------+-----------+----------+------------------------+ | name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs | +---------------------------------+-----+---------+-----------+----------+------------------------+ | convmixer_768_32 | 32 | 88.3165 | 88.3376 | 74.2573 | 74.3042 | | hrnet_w18 | 128 | 65.9985 | 66.0677 | 50.4852 | 52.1408 | | tnt_s_patch16_224 | 128 | 85.455 | 85.4381 | 45.2385 | 45.7853 | | pit_b_224 | 64 | 49.4353 | 49.4178 | 43.8321 | 44.1227 | | pnasnet5large | 16 | 56.6991 | 56.6864 | 42.5542 | 43.2171 | | dm_nfnet_f0 | 128 | 73.9658 | 74.0721 | 40.1696 | 41.4165 | | nfnet_l0 | 128 | 60.9418 | 60.9299 | 35.0593 | 35.3891 | | beit_base_patch16_224 | 64 | 38.2315 | 38.1946 | 34.816 | 34.6042 | | cait_m36_384 | 4 | 54.365 | 53.8392 | 34.0968 | 35.0604 | | res2next50 | 128 | 50.2074 | 50.2362 | 33.8482 | 36.0347 | | mixnet_l | 128 | 43.3417 | 43.3693 | 33.4691 | 34.1204 | | tf_mixnet_l | 128 | 44.7468 | 44.7004 | 33.2361 | 33.778 | | vit_base_patch16_224 | 64 | 35.8868 | 35.8997 | 33.1811 | 33.3936 | | dla102 | 128 | 46.9305 | 47.0127 | 32.3875 | 32.6512 | | mixer_b16_224 | 128 | 38.2411 | 38.0482 | 32.3461 | 32.48 | | resnest101e | 64 | 50.8634 | 50.8448 | 31.9849 | 33.846 | | poolformer_m36 | 64 | 48.5117 | 48.4287 | 30.7446 | 31.4101 | | adv_inception_v3 | 128 | 40.5404 | 40.5552 | 29.2389 | 29.5114 | | inception_v3 | 128 | 40.5563 | 40.5018 | 29.2258 | 29.5587 | | gluon_inception_v3 | 128 | 40.5303 | 40.5701 | 29.2199 | 29.5303 | | swin_base_patch4_window7_224 | 64 | 38.633 | 38.6937 | 28.69 | 29.0696 | | swsl_resnext101_32x16d | 32 | 39.7776 | 39.4724 | 27.8144 | 30.0158 | | volo_d1_224 | 64 | 43.8514 | 43.7892 | 27.4528 | 27.786 | | jx_nest_base | 32 | 36.4801 | 36.3044 | 27.4244 | 27.8742 | | res2net50_14w_8s | 128 | 40.4617 | 40.4363 | 26.787 | 27.4147 | | visformer_small | 128 | 29.5133 | 29.5069 | 26.6377 | 27.0361 | | gmlp_s16_224 | 128 | 39.1046 | 38.8193 | 26.5034 | 26.6144 | | deit_base_distilled_patch16_224 | 64 | 29.2799 | 29.2929 | 26.0479 | 26.2872 | | crossvit_9_240 | 128 | 30.0006 | 29.9943 | 24.265 | 24.9135 | | gmixer_24_224 | 128 | 35.1701 | 34.9826 | 23.703 | 23.8389 | | dpn107 | 32 | 33.7306 | 33.7207 | 23.3551 | 23.8635 | | gluon_xception65 | 32 | 28.3922 | 28.4941 | 21.9773 | 22.2356 | | eca_halonext26ts | 128 | 31.4992 | 31.5405 | 21.9412 | 22.3 | | res2net101_26w_4s | 64 | 32.3105 | 32.2097 | 21.7444 | 22.4656 | | twins_pcpvt_base | 64 | 26.6893 | 26.7326 | 21.4259 | 21.9231 | | eca_botnext26ts_256 | 128 | 30.4846 | 30.4586 | 21.0217 | 21.3381 | | convnext_base | 64 | 33.3193 | 32.9082 | 20.5694 | 20.9001 | | botnet26t_256 | 128 | 28.8232 | 28.8032 | 19.3533 | 19.4744 | | repvgg_a2 | 128 | 22.7659 | 22.7685 | 17.1192 | 17.224 | | gernet_l | 128 | 24.2677 | 24.2874 | 16.5126 | 16.6866 | | coat_lite_mini | 128 | 28.2054 | 28.2207 | 16.1199 | 16.428 | | fbnetv3_b | 128 | 22.5293 | 22.4983 | 15.9502 | 16.0742 | | cspdarknet53 | 64 | 21.8632 | 21.8857 | 15.2871 | 15.5543 | | mobilevit_s | 64 | 19.7694 | 19.8003 | 15.0637 | 15.2624 | | sebotnet33ts_256 | 64 | 23.0972 | 23.1062 | 14.3191 | 14.5299 | | rexnet_100 | 128 | 18.9534 | 18.9047 | 13.5426 | 13.8161 | | tf_efficientnet_b0 | 128 | 18.2809 | 18.3635 | 12.3823 | 12.4806 | | selecsls42b | 128 | 14.8226 | 14.8405 | 11.205 | 11.3003 | | ese_vovnet19b_dw | 128 | 16.2803 | 16.2785 | 11.194 | 11.2553 | | fbnetc_100 | 128 | 15.1928 | 15.2091 | 10.9585 | 11.0298 | | tinynet_a | 128 | 15.8431 | 15.8779 | 10.6169 | 10.7588 | | resmlp_12_224 | 128 | 19.8277 | 19.6207 | 10.0354 | 10.0698 | | spnasnet_100 | 128 | 13.37 | 13.3906 | 9.4487 | 9.5333 | | ghostnet_100 | 128 | 12.8305 | 12.8864 | 8.9337 | 9.0027 | | mnasnet_100 | 128 | 12.3796 | 12.3788 | 8.76 | 8.8137 | | mobilenetv2_100 | 128 | 12.224 | 12.2117 | 8.3321 | 8.4003 | | mobilenetv3_large_100 | 128 | 11.1546 | 11.1764 | 8.1579 | 8.2212 | | regnety_002 | 128 | 8.2647 | 8.2023 | 5.4309 | 6.0336 | | lcnet_050 | 128 | 3.3165 | 4.0985 | 3.174 | 2.9703 | +---------------------------------+-----+---------+-----------+----------+------------------------+ ~~~

Build Summary

see more ### Run name ### day_355_21_12_22_performance_float16_459 ### Commit hashes ### pytorch commit: 88c581be87ac59ea1251f35a57b610ae81b9362d pytorch commit date: 2022-12-21 04:51:51+00:00 functorch Absent torchbench commit: 43ca0857e9c7b9d90f647d1befbaee1dfe446d7e torchbench commit date: 2022-12-16 10:47:24-08:00 ### TorchDynamo config flags ### torch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False torch._dynamo.config.HAS_REFS_PRIMS = True torch._dynamo.config.capture_scalar_outputs = False torch._dynamo.config.dead_code_elimination = True torch._dynamo.config.disable = False torch._dynamo.config.dynamic_shapes = False torch._dynamo.config.enforce_cond_guards_match = True torch._dynamo.config.error_on_nested_fx_trace = True torch._dynamo.config.guard_nn_modules = False torch._dynamo.config.normalize_ir = False torch._dynamo.config.optimize_ddp = True torch._dynamo.config.output_code = False torch._dynamo.config.output_graph_code = False torch._dynamo.config.print_graph_breaks = False torch._dynamo.config.raise_on_ctx_manager_usage = True torch._dynamo.config.raise_on_unsafe_aot_autograd = False torch._dynamo.config.replay_record_enabled = False torch._dynamo.config.rewrite_assert_with_torch_assert = True torch._dynamo.config.specialize_int_float = True torch._dynamo.config.suppress_errors = False torch._dynamo.config.verbose = False torch._dynamo.config.verify_correctness = False ### Torch version ### torch: 2.0.0a0+git88c581b ### Environment variables ### TORCH_CUDA_ARCH_LIST = 8.0 CUDA_HOME = /usr/local/cuda-11.6 USE_LLVM = /usr/lib/llvm-10 ### GPU details ### CUDNN VERSION: 8302 Number CUDA Devices: 8 Device Name: NVIDIA A100-SXM4-40GB Device Memory [GB]: 42.314694656