Closed zfu82 closed 1 month ago
Tested on NV-A100
benchmark/test_pointwise_perf.py::test_perf_repeat_interleave_self_int Operator repeat_interleave_self_int Performance Test (torch.float16) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.019456 0.012288 6144 0.05632 0.036864 11264 0.096256 0.06144 16384 0.136192 0.084992 21504 0.177152 0.169984 26624 0.216064 0.171008 31744 0.256 0.172032 36864 0.295936 0.246784 41984 0.334848 0.247808 47104 0.374784 0.249856 52224 0.41472 0.323584 57344 0.454656 0.325632 62464 0.494592 0.326656 67584 0.533504 0.400384 72704 0.57344 0.402432 77824 0.612352 0.402432 Operator repeat_interleave_self_int Performance Test (torch.float32) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.017408 0.014336 6144 0.065536 0.055296 11264 0.111616 0.094208 16384 0.15872 0.134144 21504 0.2048 0.198656 26624 0.251904 0.226304 31744 0.297984 0.254976 36864 0.345088 0.295936 41984 0.392192 0.336896 47104 0.438272 0.375808 52224 0.4864 0.417792 57344 0.53248 0.457728 62464 0.57856 0.497664 67584 0.625664 0.539648 72704 0.672768 0.581632 77824 0.718848 0.61952 Operator repeat_interleave_self_int Performance Test (torch.bfloat16) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.017408 0.012288 6144 0.05632 0.036864 11264 0.09728 0.060416 16384 0.137216 0.086016 21504 0.176128 0.169984 26624 0.216064 0.171008 31744 0.254976 0.172032 36864 0.294912 0.246784 41984 0.334848 0.248832 47104 0.374784 0.249856 52224 0.41472 0.323584 57344 0.453632 0.324608 62464 0.494592 0.326656 67584 0.533504 0.400384 72704 0.57344 0.402432 77824 0.612352 0.402432 PASSED
Performance
Tested on NV-A100