Open anijain2305 opened 1 year ago
Performance speedup
+-----------------------------------+------+--------+-----------+----------+------------------------+
| name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
| functorch_dp_cifar10 | 64 | 0.9745 | 0.925 | 3.6666 | 3.6217 |
| BERT_pytorch | 16 | 0.9975 | 0.7999 | 3.1791 | 3.248 |
| densenet121 | 4 | 0.9888 | 0.6947 | 2.7868 | 2.7862 |
| hf_T5_large | 2 | 0.9806 | 0.806 | 2.3425 | 2.262 |
| hf_Albert | 8 | 0.9963 | 0.9603 | 2.3376 | 2.3399 |
| hf_Bart | 4 | 0.9801 | 0.7934 | 2.1449 | 2.4193 |
| pytorch_CycleGAN_and_pix2pix | 1 | 0.9748 | 0.8967 | 2.0857 | 1.8595 |
| phlippe_densenet | 128 | 0.9853 | 0.7714 | 2.0062 | 2.0183 |
| squeezenet1_1 | 32 | 0.9843 | 0.9261 | 2.0043 | 1.8585 |
| mobilenet_v3_large | 32 | 0.9958 | 0.7796 | 1.997 | 2.0592 |
| hf_GPT2 | 4 | 0.9953 | 0.9565 | 1.9265 | 1.9259 |
| hf_T5 | 8 | 0.9868 | 0.8503 | 1.9215 | 1.9342 |
| hf_Bert | 4 | 0.9975 | 0.8397 | 1.8429 | 1.8416 |
| hf_Longformer | 2 | 0.9252 | 0.5851 | 1.8011 | 1.8042 |
| phlippe_resnet | 128 | 0.9781 | 0.7561 | 1.8006 | 1.8106 |
| pytorch_struct | 200 | 0.9518 | 0.7782 | 1.7996 | 1.7666 |
| speech_transformer | 32 | 0.9826 | 0.7931 | 1.7197 | 1.7331 |
| resnext50_32x4d | 8 | 0.9882 | 0.7072 | 1.7009 | 1.6915 |
| timm_vision_transformer | 32 | 0.9836 | 0.8443 | 1.7005 | 1.9707 |
| mnasnet1_0 | 32 | 0.9905 | 0.7353 | 1.6732 | 1.6549 |
| attention_is_all_you_need_pytorch | 256 | 0.9887 | 0.8359 | 1.6465 | 1.6313 |
| fastNLP_Bert | 6 | 0.9847 | 0.8539 | 1.635 | 1.6491 |
| hf_Bert_large | 4 | 1.0021 | 0.8623 | 1.6239 | 1.6323 |
| resnet18 | 16 | 0.9895 | 0.7542 | 1.5738 | 1.5531 |
| shufflenet_v2_x1_0 | 128 | 0.9938 | 0.7535 | 1.5633 | 1.5206 |
| dcgan | 32 | 0.8862 | 0.7092 | 1.4916 | 1.5106 |
| mobilenet_v2 | 96 | 0.997 | 0.7779 | 1.4766 | 1.4746 |
| hf_DistilBert | 8 | 0.9836 | 0.9375 | 1.4722 | 1.4475 |
| timm_nfnet | 128 | 0.9864 | 0.9842 | 1.4585 | 1.4648 |
| timm_resnest | 32 | 0.9928 | 0.8523 | 1.4553 | 1.4551 |
| drq | 1 | 0.9672 | 0.7538 | 1.4447 | 1.4735 |
| lennard_jones | 1000 | 0.8676 | 0.7663 | 1.4389 | 1.4672 |
| timm_efficientnet | 32 | 0.9317 | 0.6227 | 1.3717 | 1.3928 |
| LearningToPaint | 96 | 0.9873 | 0.7763 | 1.2759 | 1.2733 |
| vgg16 | 64 | 0.9994 | 0.998 | 1.2434 | 1.2439 |
| pytorch_stargan | 16 | 0.9948 | 0.8039 | 1.2292 | 1.2232 |
| Super_SloMo | 6 | 0.9977 | 0.1781 | 1.2182 | 1.2192 |
| soft_actor_critic | 256 | 0.7797 | 0.6707 | 1.2117 | 1.0491 |
| pytorch_unet | 1 | 0.9969 | 0.2047 | 1.1718 | 1.1721 |
| Background_Matting | 4 | 0.9985 | 0.1371 | 1.1714 | 1.1721 |
| resnet152 | 32 | 0.9948 | 0.7447 | 1.1616 | 1.2227 |
| resnet50 | 32 | 0.9955 | 0.7607 | 1.139 | 1.1372 |
| yolov3 | 16 | 0.9967 | 0.8074 | 1.1153 | 1.1159 |
| demucs | 4 | 0.9995 | 1.0006 | 1.0262 | 1.0292 |
| tts_angular | 64 | 0.9531 | 0.9167 | 0.9745 | 0.9877 |
| timm_regnet | 32 | 0.9145 | 0.7756 | 0.9357 | 0.9334 |
| nvidia_deeprecommender | 256 | 0.9991 | 0.9984 | 0.9351 | 0.9353 |
| timm_vovnet | 32 | 0.86 | 0.7083 | 0.9249 | 0.9187 |
| timm_vision_transformer_large | 32 | 0.9982 | 0.0 | 0.0 | 0.0 |
| tacotron2 | 0 | 0.0 | 0.0 | 0.0 | 0.0 |
| doctr_reco_predictor | 0 | 0.0 | 0.0 | 0.0 | 0.0 |
| doctr_det_predictor | 0 | 0.0 | 0.0 | 0.0 | 0.0 |
| moco | 32 | 0.9764 | 0.0 | 0.0 | 0.0 |
| hf_GPT2_large | 4 | 0.9843 | 0.9721 | 0.0 | 1.7378 |
| hf_BigBird | 2 | 0.9753 | 0.7838 | 0.0 | 0.0 |
| dlrm | 1024 | 0.9529 | 0.8453 | 0.0 | 0.0 |
| hf_Reformer | 4 | 0.9928 | 0.9501 | 0.0 | 0.0 |
| alexnet | 128 | 0.9991 | 0.9974 | 0.0 | 0.0 |
| torchrec_dlrm | 0 | 0.0 | 0.0 | 0.0 | 0.0 |
+-----------------------------------+------+--------+-----------+----------+------------------------+
Accuracy
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
| name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
| hf_GPT2_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
| timm_vision_transformer_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
| hf_T5_large | 4 | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
| squeezenet1_1 | 4 | pass | pass | pass | pass |
| pytorch_stargan | 16 | pass | pass | pass | pass |
| pytorch_struct | 200 | pass | pass | pass | pass |
| resnet152 | 4 | pass | pass | pass | pass |
| resnet18 | 4 | pass | pass | pass | pass |
| resnet50 | 4 | pass | pass | pass | pass |
| resnext50_32x4d | 4 | pass | pass | pass | pass |
| shufflenet_v2_x1_0 | 4 | pass | pass | pass | pass |
| soft_actor_critic | 256 | pass | pass | pass | pass |
| speech_transformer | 4 | pass | pass | pass | pass |
| timm_efficientnet | 4 | pass | pass | pass | pass |
| phlippe_densenet | 4 | pass | pass | pass | pass |
| timm_nfnet | 4 | pass | pass | pass | pass |
| timm_regnet | 4 | pass | pass | pass | pass |
| timm_resnest | 4 | pass | pass | pass | pass |
| timm_vision_transformer | 4 | pass | pass | pass | pass |
| timm_vovnet | 4 | pass | pass | pass | pass |
| tts_angular | 4 | pass | pass | pass | pass |
| vgg16 | 4 | pass | pass | pass | pass |
| yolov3 | 4 | pass | pass | pass | pass |
| BERT_pytorch | 4 | fail_accuracy | pass | pass | pass |
| mobilenet_v3_large | 4 | pass | pass | pass | fail_accuracy |
| pytorch_CycleGAN_and_pix2pix | 1 | pass | pass | pass | pass |
| pytorch_unet | 2 | pass | pass | pass | pass |
| nvidia_deeprecommender | 4 | pass | pass | pass | pass |
| hf_Albert | 4 | pass | pass | pass | pass |
| LearningToPaint | 4 | pass | pass | pass | pass |
| Super_SloMo | 4 | pass | pass | pass | pass |
| alexnet | 4 | pass | pass | pass | pass |
| attention_is_all_you_need_pytorch | 4 | pass | pass | pass | pass |
| dcgan | 4 | pass | pass | pass | pass |
| demucs | 4 | pass | pass | pass | pass |
| mobilenet_v2 | 4 | pass | pass | pass | pass |
| drq | 1 | pass | pass | pass | pass |
| fastNLP_Bert | 4 | pass | pass | pass | pass |
| functorch_dp_cifar10 | 4 | pass | pass | pass | pass |
| densenet121 | 4 | pass | pass | pass | pass |
| hf_Bart | 4 | pass | pass | pass | pass |
| hf_Bert_large | 4 | pass | pass | pass | pass |
| hf_DistilBert | 4 | pass | pass | pass | pass |
| hf_GPT2 | 2 | pass | pass | pass | pass |
| hf_Longformer | 4 | pass | pass | pass | pass |
| hf_Reformer | 4 | pass | pass | pass | pass |
| hf_T5 | 4 | pass | pass | pass | pass |
| hf_T5_base | 4 | pass | pass | pass | pass |
| lennard_jones | 4 | pass | pass | pass | pass |
| mnasnet1_0 | 4 | pass | pass | pass | pass |
| hf_Bert | 4 | pass | pass | pass | pass |
| moco | 4 | pass | fail_to_run | fail_to_run | fail_to_run |
| hf_BigBird | 4 | pass | pass | fail_to_run | fail_to_run |
| dlrm | 4 | pass | pass | fail_to_run | fail_to_run |
| phlippe_resnet | 4 | pass | pass | fail_accuracy | fail_accuracy |
| Background_Matting | 4 | eager_variation | eager_variation | eager_variation | eager_variation |
| vision_maskrcnn | 4 | eager_variation | eager_variation | eager_variation | eager_variation |
| tacotron2 | 4 | fail_to_run | fail_to_run | 0.0000 | 0.0000 |
| doctr_det_predictor | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| doctr_reco_predictor | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| llama | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| torchrec_dlrm | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
Compilation latency (sec)
+-----------------------------------+------+---------+-----------+----------+------------------------+
| name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
| speech_transformer | 32 | 5.9379 | 13.5706 | 804.1871 | 42.8845 |
| attention_is_all_you_need_pytorch | 256 | 4.324 | 10.7867 | 689.7678 | 38.4134 |
| hf_T5_large | 2 | 26.2442 | 54.7406 | 506.3967 | 148.4014 |
| timm_vision_transformer | 32 | 3.3655 | 7.1652 | 436.4946 | 26.1915 |
| hf_Albert | 8 | 2.4513 | 8.5711 | 422.7417 | 28.404 |
| phlippe_densenet | 128 | 3.2458 | 6.9257 | 416.1867 | 25.5266 |
| fastNLP_Bert | 6 | 4.9636 | 11.1295 | 398.4251 | 34.5511 |
| pytorch_struct | 200 | 0.7813 | 1.3378 | 354.4235 | 6.9333 |
| BERT_pytorch | 16 | 4.7902 | 11.449 | 350.6787 | 34.5791 |
| mobilenet_v2 | 96 | 3.094 | 6.9056 | 321.413 | 24.9613 |
| hf_Bert_large | 4 | 10.1418 | 20.7939 | 314.3734 | 60.6028 |
| mnasnet1_0 | 32 | 3.0959 | 6.6976 | 314.3265 | 23.5121 |
| densenet121 | 4 | 7.4234 | 17.9994 | 309.2972 | 61.1015 |
| hf_T5 | 8 | 5.6153 | 13.4608 | 291.5986 | 39.5658 |
| mobilenet_v3_large | 32 | 3.3994 | 7.5884 | 256.7481 | 27.1516 |
| drq | 1 | 0.6686 | 1.0099 | 253.8761 | 6.3746 |
| nvidia_deeprecommender | 256 | 0.4823 | 0.766 | 249.458 | 5.9622 |
| hf_Longformer | 2 | 11.2554 | 31.1595 | 243.8476 | 123.5476 |
| yolov3 | 16 | 4.812 | 10.4255 | 227.5451 | 36.3119 |
| hf_GPT2 | 4 | 4.6244 | 9.6041 | 218.6052 | 29.5181 |
| shufflenet_v2_x1_0 | 128 | 3.4342 | 7.6127 | 215.3652 | 26.8468 |
| timm_efficientnet | 32 | 4.9429 | 10.0025 | 207.2363 | 30.7292 |
| timm_nfnet | 128 | 5.7487 | 10.9879 | 206.3631 | 32.057 |
| timm_vovnet | 32 | 3.5961 | 6.2972 | 199.301 | 22.1326 |
| soft_actor_critic | 256 | 0.4404 | 0.6177 | 179.8936 | 5.4369 |
| hf_Bart | 4 | 10.8484 | 18.0336 | 179.5238 | 49.6952 |
| timm_regnet | 32 | 6.6018 | 12.1995 | 179.196 | 33.0846 |
| LearningToPaint | 96 | 1.4753 | 2.8955 | 167.1134 | 12.25 |
| resnet152 | 32 | 8.8693 | 20.1297 | 163.4345 | 58.4397 |
| vgg16 | 64 | 0.6332 | 1.1205 | 160.4233 | 7.4845 |
| resnext50_32x4d | 8 | 3.1743 | 7.4339 | 158.7375 | 22.7869 |
| lennard_jones | 1000 | 0.3987 | 0.6209 | 143.0381 | 4.5367 |
| Background_Matting | 4 | 3.2032 | 11.4127 | 131.4817 | 26.8162 |
| resnet18 | 16 | 1.338 | 2.7724 | 128.2701 | 12.183 |
| pytorch_unet | 1 | 1.5283 | 4.4352 | 121.5264 | 13.9278 |
| functorch_dp_cifar10 | 64 | 1.1992 | 2.5475 | 117.7511 | 12.8811 |
| phlippe_resnet | 128 | 1.349 | 2.7318 | 113.7247 | 10.8462 |
| hf_Bert | 4 | 4.9301 | 10.3482 | 110.8388 | 32.425 |
| pytorch_CycleGAN_and_pix2pix | 1 | 1.2026 | 2.8968 | 89.0985 | 12.3657 |
| timm_resnest | 32 | 1.822 | 3.8811 | 78.4203 | 16.7527 |
| Super_SloMo | 6 | 2.7734 | 9.7645 | 73.1308 | 25.4957 |
| demucs | 4 | 1.4955 | 2.2725 | 71.8464 | 9.6114 |
| hf_DistilBert | 8 | 2.3655 | 5.6075 | 61.993 | 19.3363 |
| pytorch_stargan | 16 | 1.1848 | 3.2111 | 46.9079 | 10.7557 |
| squeezenet1_1 | 32 | 1.0332 | 1.7378 | 44.4823 | 8.5951 |
| resnet50 | 32 | 3.1836 | 7.4252 | 23.9834 | 23.0489 |
| dcgan | 32 | 0.4331 | 0.7077 | 16.4177 | 5.1875 |
| tts_angular | 64 | 0.4423 | 0.5108 | 4.7125 | 3.838 |
| hf_GPT2_large | 4 | 14.8619 | 29.6938 | nan | 84.9245 |
| hf_BigBird | 2 | 12.8484 | 39.0664 | nan | nan |
| hf_Reformer | 4 | 4.1752 | 6.3515 | nan | nan |
| dlrm | 1024 | 0.374 | 0.7853 | nan | nan |
| alexnet | 128 | 0.5032 | 0.7703 | nan | nan |
| moco | 32 | 27.3074 | nan | nan | nan |
| timm_vision_transformer_large | 32 | 9.3266 | nan | nan | nan |
| doctr_det_predictor | 0 | nan | nan | nan | nan |
| doctr_reco_predictor | 0 | nan | nan | nan | nan |
| tacotron2 | 0 | nan | nan | nan | nan |
| torchrec_dlrm | 0 | nan | nan | nan | nan |
+-----------------------------------+------+---------+-----------+----------+------------------------+
Peak Memory Compression Ratio
+-----------------------------------+------+--------+-----------+----------+------------------------+
| name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
| Super_SloMo | 6 | 1.0014 | 0.822 | 1.1588 | 1.208 |
| hf_Albert | 8 | 0.9599 | 0.9008 | 1.0399 | 1.0863 |
| mobilenet_v2 | 96 | 0.9864 | 0.7651 | 1.0107 | 1.0572 |
| hf_T5 | 8 | 0.9507 | 0.8891 | 0.9988 | 1.0163 |
| fastNLP_Bert | 6 | 1.0003 | 0.8878 | 0.9953 | 1.052 |
| tts_angular | 64 | 0.9957 | 0.9957 | 0.9852 | 0.9852 |
| attention_is_all_you_need_pytorch | 256 | 0.9648 | 0.9066 | 0.9693 | 1.0269 |
| timm_nfnet | 128 | 0.907 | 0.8752 | 0.9619 | 0.9678 |
| BERT_pytorch | 16 | 1.0003 | 0.8671 | 0.9428 | 0.9428 |
| hf_Bert | 4 | 0.9645 | 0.8353 | 0.9421 | 0.9421 |
| hf_GPT2 | 4 | 0.9357 | 0.8198 | 0.9317 | 0.9319 |
| hf_Bert_large | 4 | 0.9845 | 0.8521 | 0.9138 | 0.9401 |
| timm_efficientnet | 32 | 0.9865 | 0.819 | 0.874 | 1.072 |
| yolov3 | 16 | 0.9923 | 0.8257 | 0.8711 | 0.8705 |
| shufflenet_v2_x1_0 | 128 | 0.9549 | 0.8395 | 0.8621 | 0.8979 |
| speech_transformer | 32 | 0.9915 | 0.9 | 0.8583 | 1.0773 |
| timm_regnet | 32 | 0.995 | 0.8499 | 0.8501 | 0.8484 |
| hf_DistilBert | 8 | 0.9262 | 0.8146 | 0.8456 | 0.8517 |
| resnet50 | 32 | 0.9922 | 0.8613 | 0.8365 | 0.8344 |
| timm_vision_transformer | 32 | 0.9907 | 0.9299 | 0.8357 | 0.9369 |
| Background_Matting | 4 | 1.0125 | 0.6487 | 0.834 | 0.8484 |
| resnet152 | 32 | 0.9959 | 0.8916 | 0.8319 | 0.8684 |
| timm_resnest | 32 | 0.9888 | 0.8973 | 0.8297 | 0.9564 |
| hf_T5_large | 2 | 0.9831 | 0.8302 | 0.8201 | 0.8201 |
| phlippe_densenet | 128 | 0.9983 | 0.9982 | 0.7988 | 1.0061 |
| pytorch_unet | 1 | 0.9953 | 0.7154 | 0.7734 | 0.8554 |
| squeezenet1_1 | 32 | 0.9674 | 0.9309 | 0.773 | 1.0247 |
| pytorch_stargan | 16 | 0.9914 | 0.969 | 0.7715 | 0.9248 |
| demucs | 4 | 0.9663 | 0.9659 | 0.7661 | 0.7734 |
| hf_Bart | 4 | 0.9084 | 0.843 | 0.7545 | 0.7546 |
| timm_vovnet | 32 | 0.9892 | 0.8166 | 0.7428 | 0.8185 |
| pytorch_struct | 200 | 0.9992 | 0.5168 | 0.7338 | 0.9955 |
| vgg16 | 64 | 0.9922 | 0.7246 | 0.723 | 0.7231 |
| mnasnet1_0 | 32 | 0.9819 | 0.8641 | 0.7201 | 0.8596 |
| densenet121 | 4 | 0.9956 | 0.9802 | 0.7085 | 0.9766 |
| mobilenet_v3_large | 32 | 0.9801 | 0.8396 | 0.6992 | 0.9037 |
| nvidia_deeprecommender | 256 | 0.9176 | 0.8055 | 0.6585 | 0.6585 |
| resnext50_32x4d | 8 | 0.9947 | 0.8438 | 0.6561 | 0.7855 |
| LearningToPaint | 96 | 0.9192 | 0.7116 | 0.597 | 0.7089 |
| pytorch_CycleGAN_and_pix2pix | 1 | 0.9965 | 0.8796 | 0.5458 | 0.8393 |
| resnet18 | 16 | 0.983 | 0.8055 | 0.5409 | 0.7792 |
| hf_Longformer | 2 | 0.8565 | 0.8296 | 0.4206 | 0.4205 |
| functorch_dp_cifar10 | 64 | 0.9953 | 0.8396 | 0.3991 | 0.7086 |
| phlippe_resnet | 128 | 0.9881 | 0.864 | 0.3272 | 0.8517 |
| drq | 1 | 0.9877 | 0.8852 | 0.1818 | 0.6379 |
| dcgan | 32 | 0.9647 | 0.7957 | 0.1811 | 0.7821 |
| soft_actor_critic | 256 | 0.9995 | 0.9255 | 0.1109 | 0.6066 |
| lennard_jones | 1000 | 0.9996 | 0.9997 | 0.0648 | 0.7073 |
| hf_GPT2_large | 4 | 0.9663 | 0.8303 | nan | 0.8905 |
| dlrm | 1024 | 0.9995 | 0.9944 | nan | nan |
| hf_BigBird | 2 | 0.9493 | 0.9268 | nan | nan |
| hf_Reformer | 4 | 0.8004 | 0.8004 | nan | nan |
| alexnet | 128 | 0.9452 | 0.7935 | nan | nan |
| timm_vision_transformer_large | 32 | 0.9992 | nan | nan | nan |
| moco | 32 | 0.9958 | nan | nan | nan |
| doctr_det_predictor | 0 | nan | nan | nan | nan |
| doctr_reco_predictor | 0 | nan | nan | nan | nan |
| tacotron2 | 0 | nan | nan | nan | nan |
| torchrec_dlrm | 0 | nan | nan | nan | nan |
+-----------------------------------+------+--------+-----------+----------+------------------------+
Absolute latency (ms)
+-----------------------------------+------+----------+-----------+----------+------------------------+
| name | bs | eager | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
| Background_Matting | 4 | 126.0957 | 918.6078 | 107.5371 | 107.5328 |
| hf_T5_large | 2 | 269.1613 | 273.5731 | 98.1109 | 97.8314 |
| hf_T5 | 8 | 183.1776 | 210.9367 | 93.3605 | 93.504 |
| timm_nfnet | 128 | 119.6264 | 120.2675 | 80.7577 | 80.7667 |
| Super_SloMo | 6 | 79.8167 | 446.6856 | 65.4352 | 65.1566 |
| hf_Longformer | 2 | 122.75 | 193.0691 | 62.2797 | 62.0164 |
| yolov3 | 16 | 68.7648 | 84.7996 | 61.4977 | 61.413 |
| timm_regnet | 32 | 61.4621 | 71.7881 | 59.6524 | 60.1548 |
| resnet152 | 32 | 63.4651 | 87.8524 | 54.3377 | 55.1513 |
| vgg16 | 64 | 66.2892 | 66.3835 | 53.3402 | 53.3047 |
| demucs | 4 | 53.5993 | 53.4955 | 52.2167 | 52.3699 |
| hf_Bert_large | 4 | 83.6988 | 94.6253 | 50.9415 | 50.8037 |
| pytorch_unet | 1 | 39.9741 | 194.5111 | 34.0153 | 33.9792 |
| speech_transformer | 32 | 59.5544 | 84.1112 | 33.4183 | 33.0468 |
| fastNLP_Bert | 6 | 57.0606 | 60.7527 | 33.011 | 31.1786 |
| attention_is_all_you_need_pytorch | 256 | 58.3394 | 68.5091 | 32.8929 | 32.8822 |
| hf_Bart | 4 | 71.723 | 86.4912 | 32.6812 | 33.0355 |
| mobilenet_v2 | 96 | 47.1521 | 60.4154 | 31.7936 | 31.8657 |
| hf_Albert | 8 | 68.645 | 72.3887 | 29.6789 | 29.6467 |
| timm_vovnet | 32 | 28.8224 | 35.1916 | 26.8202 | 26.7869 |
| hf_GPT2 | 4 | 49.3612 | 50.6168 | 25.3235 | 25.2676 |
| timm_efficientnet | 32 | 34.564 | 51.7456 | 23.5489 | 23.2529 |
| resnet50 | 32 | 26.2812 | 37.0821 | 22.8879 | 22.8241 |
| hf_Bert | 4 | 40.7494 | 48.2982 | 22.4111 | 22.4317 |
| hf_DistilBert | 8 | 32.1005 | 35.7249 | 22.0729 | 22.0321 |
| densenet121 | 4 | 60.8842 | 86.2346 | 20.9685 | 18.8171 |
| shufflenet_v2_x1_0 | 128 | 32.1105 | 40.0698 | 19.7102 | 19.7218 |
| BERT_pytorch | 16 | 53.4104 | 66.8912 | 17.0935 | 17.0782 |
| timm_vision_transformer | 32 | 33.391 | 33.5448 | 16.7177 | 16.6983 |
| timm_resnest | 32 | 24.2609 | 28.3734 | 16.6174 | 16.6065 |
| mobilenet_v3_large | 32 | 28.9221 | 36.5796 | 13.3713 | 13.8347 |
| mnasnet1_0 | 32 | 23.6481 | 31.9652 | 13.2494 | 13.1868 |
| pytorch_stargan | 16 | 14.7275 | 18.1919 | 11.9033 | 11.8483 |
| phlippe_densenet | 128 | 23.9166 | 30.3144 | 11.9014 | 11.541 |
| resnext50_32x4d | 8 | 22.3265 | 30.6785 | 11.8135 | 11.6231 |
| nvidia_deeprecommender | 256 | 10.2265 | 10.2372 | 10.9236 | 10.9303 |
| LearningToPaint | 96 | 12.0771 | 14.3152 | 8.7481 | 8.7039 |
| pytorch_CycleGAN_and_pix2pix | 1 | 13.8376 | 15.1046 | 7.2182 | 7.1309 |
| tts_angular | 64 | 6.5685 | 6.8807 | 6.5165 | 6.3773 |
| resnet18 | 16 | 9.8001 | 12.82 | 5.7904 | 5.7424 |
| squeezenet1_1 | 32 | 10.3157 | 11.8552 | 5.4993 | 5.4342 |
| phlippe_resnet | 128 | 9.2808 | 12.0101 | 5.0985 | 5.0227 |
| functorch_dp_cifar10 | 64 | 10.5392 | 12.2784 | 2.8779 | 2.8591 |
| drq | 1 | 3.3757 | 4.345 | 2.8418 | 3.0024 |
| pytorch_struct | 200 | 5.596 | 6.1243 | 2.7861 | 2.688 |
| dcgan | 32 | 2.3663 | 3.0443 | 1.4545 | 1.4307 |
| soft_actor_critic | 256 | 2.6689 | 3.419 | 1.2898 | 1.8432 |
| lennard_jones | 1000 | 1.7473 | 2.3537 | 1.0843 | 1.0323 |
| hf_GPT2_large | 4 | 213.7892 | 214.8782 | nan | 120.237 |
| hf_BigBird | 2 | 197.2802 | 275.9741 | nan | nan |
| hf_Reformer | 4 | 81.6483 | 86.0763 | nan | nan |
| alexnet | 128 | 9.8389 | 9.8565 | nan | nan |
| dlrm | 1024 | 4.942 | 5.0729 | nan | nan |
| timm_vision_transformer_large | 32 | 465.2338 | nan | nan | nan |
| moco | 32 | 50.3693 | nan | nan | nan |
| doctr_det_predictor | 0 | nan | nan | nan | nan |
| doctr_reco_predictor | 0 | nan | nan | nan | nan |
| tacotron2 | 0 | nan | nan | nan | nan |
| torchrec_dlrm | 0 | nan | nan | nan | nan |
+-----------------------------------+------+----------+-----------+----------+------------------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 90%, 53/59 | 100%, 45/45 | 68%, 41/60 |
| aot_eager | 88%, 52/59 | 100%, 45/45 | 92%, 55/60 |
| inductor | 78%, 46/59 | 84%, 38/45 | 93%, 56/60 |
| inductor_no_cudagraphs | 78%, 46/59 | 84%, 38/45 | 92%, 55/60 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| inductor | 1.59x | 1.67x | 1.38x |
| inductor_no_cudagraphs | 1.57x | 1.68x | 1.39x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 4.73 | 7.46 | 5.96 |
| aot_eager | 9.28 | 16.12 | 12.80 |
| inductor | 272.07 | 338.74 | 458.29 |
| inductor_no_cudagraphs | 273.46 | 324.87 | 448.96 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.97x | 1.00x |
| aot_eager | 0.86x | 0.89x | 0.89x |
| inductor | 0.75x | 0.90x | 0.90x |
| inductor_no_cudagraphs | 0.75x | 0.90x | 0.90x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 94%, 59/63 | 100%, 46/46 | 100%, 60/60 |
| aot_eager | 90%, 57/63 | 100%, 46/46 | 100%, 60/60 |
| inductor | 84%, 53/63 | 100%, 46/46 | 98%, 59/60 |
| inductor_no_cudagraphs | 86%, 54/63 | 100%, 46/46 | 98%, 59/60 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| inductor | 1.49x | 1.37x | 1.33x |
| inductor_no_cudagraphs | 1.39x | 1.33x | 1.32x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 16.76 | 3.72 | 2.65 |
| aot_eager | 26.52 | 6.18 | 5.35 |
| inductor | 16.49 | 21.29 | 21.32 |
| inductor_no_cudagraphs | 15.67 | 19.13 | 20.99 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.04x | 1.01x | 1.17x |
| aot_eager | 1.01x | 1.01x | 1.18x |
| inductor | 0.92x | 1.16x | 1.09x |
| inductor_no_cudagraphs | 0.99x | 1.25x | 1.16x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-----------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune | 78%, 47/60 | 91%, 41/45 | 95%, 57/60 |
+-----------------------+------------+-------------+-------------+
Geometric mean speedup
+-----------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune | 1.61x | 1.62x | 1.42x |
+-----------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+-----------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune | 348.81 | 210.04 | 497.89 |
+-----------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+-----------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune | 0.77x | 0.90x | 0.91x |
+-----------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs | 80%, 48/60 | 96%, 43/45 | 95%, 57/60 |
+-------------------------------------+------------+-------------+-------------+
Geometric mean speedup
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs | 1.32x | 1.57x | 1.40x |
+-------------------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs | 360.44 | 222.95 | 497.02 |
+-------------------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs | 0.88x | 1.02x | 1.01x |
+-------------------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 100%, 60/60 |
| inductor_no_cudagraphs | 100%, 60/60 |
| inductor_max_autotune | 100%, 60/60 |
| inductor_max_autotune_no_cudagraphs | 100%, 60/60 |
+-------------------------------------+-------------+
Geometric mean speedup
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 1.42x |
| inductor_no_cudagraphs | 1.40x |
| inductor_max_autotune | 1.47x |
| inductor_max_autotune_no_cudagraphs | 1.44x |
+-------------------------------------+-------------+
Mean compilation time (seconds)
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 80.25 |
| inductor_no_cudagraphs | 44.69 |
| inductor_max_autotune | 372.93 |
| inductor_max_autotune_no_cudagraphs | 52.43 |
+-------------------------------------+-------------+
Peak memory footprint compression ratio (higher is better)
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 0.91x |
| inductor_no_cudagraphs | 1.03x |
| inductor_max_autotune | 0.90x |
| inductor_max_autotune_no_cudagraphs | 1.03x |
+-------------------------------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
| aot_eager | 87%, 52/60 | 100%, 45/45 | 97%, 58/60 |
| inductor | 85%, 51/60 | 91%, 41/45 | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45 | 100%, 60/60 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| inductor | 1.59x | 1.58x | 1.41x |
| inductor_no_cudagraphs | 1.27x | 1.50x | 1.39x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 4.85 | 7.26 | 5.99 |
| aot_eager | 9.37 | 15.82 | 13.21 |
| inductor | 63.80 | 62.92 | 111.25 |
| inductor_no_cudagraphs | 64.01 | 72.27 | 110.32 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.86x | 0.90x | 0.88x |
| inductor | 0.79x | 0.91x | 0.91x |
| inductor_no_cudagraphs | 0.94x | 1.05x | 1.01x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 82%, 50/61 | 100%, 46/46 | 100%, 60/60 |
| aot_eager | 77%, 47/61 | 100%, 46/46 | 100%, 60/60 |
| inductor | 74%, 45/61 | 93%, 43/46 | 100%, 60/60 |
| inductor_no_cudagraphs | 75%, 46/61 | 98%, 45/46 | 100%, 60/60 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| inductor | 1.32x | 1.22x | 1.23x |
| inductor_no_cudagraphs | 1.18x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 3.64 | 4.90 | 4.08 |
| aot_eager | 7.69 | 11.32 | 9.93 |
| inductor | 59.38 | 51.27 | 100.75 |
| inductor_no_cudagraphs | 58.66 | 47.84 | 99.83 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.99x | 1.00x | 1.00x |
| aot_eager | 0.88x | 0.92x | 0.89x |
| inductor | 0.81x | 0.84x | 0.92x |
| inductor_no_cudagraphs | 0.97x | 0.98x | 1.02x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 100%, 2/2 |
| inductor_no_cudagraphs | 100%, 2/2 |
| inductor_max_autotune | 100%, 2/2 |
| inductor_max_autotune_no_cudagraphs | 100%, 2/2 |
+-------------------------------------+-------------+
Geometric mean speedup
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 2.54x |
| inductor_no_cudagraphs | 2.20x |
| inductor_max_autotune | 2.72x |
| inductor_max_autotune_no_cudagraphs | 2.33x |
+-------------------------------------+-------------+
Mean compilation time (seconds)
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 106.20 |
| inductor_no_cudagraphs | 69.36 |
| inductor_max_autotune | 748.14 |
| inductor_max_autotune_no_cudagraphs | 81.81 |
+-------------------------------------+-------------+
Peak memory footprint compression ratio (higher is better)
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 0.90x |
| inductor_no_cudagraphs | 1.03x |
| inductor_max_autotune | 0.91x |
| inductor_max_autotune_no_cudagraphs | 1.04x |
+-------------------------------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 100%, 2/2 |
| inductor_no_cudagraphs | 100%, 2/2 |
| inductor_max_autotune | 100%, 2/2 |
| inductor_max_autotune_no_cudagraphs | 100%, 2/2 |
+-------------------------------------+-------------+
Geometric mean speedup
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 2.31x |
| inductor_no_cudagraphs | 2.01x |
| inductor_max_autotune | 3.04x |
| inductor_max_autotune_no_cudagraphs | 2.39x |
+-------------------------------------+-------------+
Mean compilation time (seconds)
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 108.16 |
| inductor_no_cudagraphs | 68.81 |
| inductor_max_autotune | 890.39 |
| inductor_max_autotune_no_cudagraphs | 83.96 |
+-------------------------------------+-------------+
Peak memory footprint compression ratio (higher is better)
+-------------------------------------+-------------+
| Compiler | timm_models |
+-------------------------------------+-------------+
| inductor | 0.90x |
| inductor_no_cudagraphs | 1.03x |
| inductor_max_autotune | 0.91x |
| inductor_max_autotune_no_cudagraphs | 1.04x |
+-------------------------------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor | 85%, 51/60 | 91%, 41/45 | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45 | 100%, 60/60 |
| inductor_max_autotune | 78%, 47/60 | 91%, 41/45 | 98%, 59/60 |
| inductor_max_autotune_no_cudagraphs | 82%, 49/60 | 96%, 43/45 | 100%, 60/60 |
+-------------------------------------+------------+-------------+-------------+
Geometric mean speedup
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor | 1.61x | 1.60x | 1.40x |
| inductor_no_cudagraphs | 1.29x | 1.51x | 1.39x |
| inductor_max_autotune | 1.61x | 1.63x | 1.44x |
| inductor_max_autotune_no_cudagraphs | 1.35x | 1.58x | 1.42x |
+-------------------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor | 56.68 | 59.65 | 79.10 |
| inductor_no_cudagraphs | 30.39 | 42.67 | 46.98 |
| inductor_max_autotune | 257.92 | 186.71 | 381.29 |
| inductor_max_autotune_no_cudagraphs | 37.42 | 56.47 | 56.80 |
+-------------------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+-------------------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor | 0.79x | 0.91x | 0.91x |
| inductor_no_cudagraphs | 1.07x | 1.06x | 1.05x |
| inductor_max_autotune | 0.76x | 0.89x | 0.91x |
| inductor_max_autotune_no_cudagraphs | 1.07x | 1.06x | 1.05x |
+-------------------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 87%, 55/63 | 100%, 45/45 | 98%, 60/61 |
| aot_eager | 87%, 55/63 | 100%, 45/45 | 98%, 60/61 |
| inductor | 83%, 52/63 | 93%, 42/45 | 97%, 59/61 |
| inductor_no_cudagraphs | 84%, 53/63 | 98%, 44/45 | 98%, 60/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| inductor | 1.62x | 1.65x | 1.46x |
| inductor_no_cudagraphs | 1.30x | 1.58x | 1.40x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 4.08 | 6.46 | 5.00 |
| aot_eager | 8.77 | 14.66 | 11.60 |
| inductor | 53.21 | 53.30 | 90.52 |
| inductor_no_cudagraphs | 58.03 | 52.81 | 102.89 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 0.99x | 0.96x | 1.00x |
| inductor | 1.03x | 0.98x | 1.01x |
| inductor_no_cudagraphs | 1.00x | 1.01x | 1.00x |
+------------------------+------------+-------------+-------------+
(next 2 comments are for max-autotune, warm start run)
AMP RUN
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)