Open zfu82 opened 3 weeks ago
Tested on NV-A100
Operator repeat_interleave_self_tensor Performance Test (dtype=torch.float16, mode=cuda) Size Torch Latency (ms) Gems Latency (ms) Gems Speedup --------------------------------------------------------------- 1024 0.336896 20.0387 0.0168 6144 1.53498 20.9459 0.0733 11264 2.7648 20.4554 0.135 16384 4.12979 21.6965 0.19 21504 5.376 21.4784 0.25 26624 7.02874 22.4543 0.313 31744 8.03123 22.8055 0.352 36864 8.09677 22.5853 0.358 41984 10.2134 23.1731 0.441 47104 11.3715 23.2704 0.489 52224 12.6669 24.6088 0.515 57344 13.7267 25.2928 0.543 62464 15.0774 25.3972 0.594 67584 15.2904 24.5217 0.624 72704 16.7752 24.8955 0.674 77824 17.6722 26.4264 0.669 Operator repeat_interleave_self_tensor Performance Test (dtype=torch.float32, mode=cuda) Size Torch Latency (ms) Gems Latency (ms) Gems Speedup --------------------------------------------------------------- 1024 0.338944 19.7806 0.0171 6144 1.54419 20.1861 0.0765 11264 3.11194 21.5511 0.144 16384 4.07859 21.1241 0.193 21504 5.98528 22.484 0.266 26624 7.27859 22.9478 0.317 31744 8.11418 22.4348 0.362 36864 8.45619 23.6575 0.357 41984 10.6609 23.8254 0.447 47104 11.732 24.4019 0.481 52224 13.4359 25.1873 0.533 57344 13.9284 25.385 0.549 62464 15.7348 26.5933 0.592 67584 15.9037 26.751 0.595 72704 17.792 27.5722 0.645 77824 19.1754 28.2491 0.679 Operator repeat_interleave_self_tensor Performance Test (dtype=torch.bfloat16, mode=cuda) Size Torch Latency (ms) Gems Latency (ms) Gems Speedup --------------------------------------------------------------- 1024 0.3328 20.1001 0.0166 6144 1.49504 20.1185 0.0743 11264 2.75968 20.3551 0.136 16384 3.95776 20.4063 0.194 21504 5.3545 21.1671 0.253 26624 6.84954 21.8993 0.313 31744 7.75066 21.7375 0.357 36864 8.07424 22.0928 0.365 41984 10.1786 22.6376 0.45 47104 11.1913 23.1301 0.484 52224 12.1201 23.682 0.512 57344 13.2219 24.4797 0.54 62464 14.9258 24.705 0.604 67584 14.8818 25.1003 0.593 72704 16.7035 25.7403 0.649 77824 17.9456 27.0879 0.662 PASSED
Performance
Tested on NV-A100