Closed zfu82 closed 1 month ago
Tested on NV-A100
benchmark/test_special_perf.py::test_perf_cat Operator cat Performance Test (torch.float16) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.018432 0.016384 6144 0.05632 0.045056 11264 0.095232 0.069632 16384 0.134144 0.093184 21504 0.17408 0.11776 26624 0.212992 0.141312 31744 0.253952 0.165888 36864 0.293888 0.192512 41984 0.333824 0.217088 47104 0.374784 0.24064 52224 0.41472 0.264192 57344 0.45568 0.288768 62464 0.49664 0.31232 67584 0.538624 0.338944 72704 0.579584 0.36352 77824 0.620544 0.388096 Operator cat Performance Test (torch.float32) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.018432 0.02048 6144 0.07168 0.072704 11264 0.120832 0.120832 16384 0.173056 0.167936 21504 0.223232 0.21504 26624 0.274432 0.262144 31744 0.324608 0.310272 36864 0.375808 0.357376 41984 0.425984 0.406528 47104 0.477184 0.45568 52224 0.52736 0.504832 57344 0.57856 0.55296 62464 0.62976 0.601088 67584 0.679936 0.65024 72704 0.731136 0.700416 77824 0.782336 0.746496 Operator cat Performance Test (torch.bfloat16) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.016384 0.016384 6144 0.05632 0.045056 11264 0.095232 0.069632 16384 0.134144 0.093184 21504 0.17408 0.11776 26624 0.214016 0.141312 31744 0.253952 0.165888 36864 0.292864 0.192512 41984 0.333824 0.217088 47104 0.374784 0.24064 52224 0.41472 0.264192 57344 0.45568 0.288768 62464 0.49664 0.31232 67584 0.538624 0.338944 72704 0.579584 0.36352 77824 0.620544 0.388096 PASSED benchmark/test_special_perf.py::test_perf_cat_int Operator cat Performance Test (torch.int16) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.017408 0.016384 6144 0.05632 0.045056 11264 0.094208 0.069632 16384 0.135168 0.093184 21504 0.17408 0.11776 26624 0.212992 0.141312 31744 0.253952 0.165888 36864 0.292864 0.193536 41984 0.333824 0.217088 47104 0.374784 0.24064 52224 0.41472 0.264192 57344 0.45568 0.288768 62464 0.49664 0.31232 67584 0.538624 0.338944 72704 0.57856 0.36352 77824 0.620544 0.388096 Operator cat Performance Test (torch.int32) Size Torch Latency (ms) Gems Latency (ms) -------------------------------------------------- 1024 0.019456 0.02048 6144 0.07168 0.073728 11264 0.120832 0.120832 16384 0.172032 0.167936 21504 0.223232 0.21504 26624 0.274432 0.262144 31744 0.324608 0.310272 36864 0.375808 0.357376 41984 0.427008 0.406528 47104 0.477184 0.45568 52224 0.528384 0.504832 57344 0.579584 0.55296 62464 0.62976 0.601088 67584 0.679936 0.65024 72704 0.731136 0.700416 77824 0.782336 0.746496 PASSED
Performance
Tested on NV-A100