Update input sizes for `test_many_segment_benchmark` to ensure kernel reuse

NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

Other

271 stars 53 forks source link

Update input sizes for `test_many_segment_benchmark` to ensure kernel reuse #3388

Closed Priya2698 closed 1 week ago

Priya2698 commented 1 week ago

Updating the sizes for this benchmark since we use different heuristics leading to kernel recompilation on A100 and H100.

Before: sizes = [4, 8, 16, 32, 64, 128

Current: sizes = [5, 7, 9, 11]

The dynamic measurement has lower standard deviation since we reuse kernels for all cases, and the average measurement is ~1.8ms as opposed to ~80ms with the earlier input sizes, with the maximum measurement of ~400ms

Priya2698 commented 1 week ago

!build