Avoid canScheduleCompileTime for dynamic shape check

jacobhinkle commented 3 days ago

This PR plumbs through an option to skip the call to canScheduleCompileTime in Schedule::canSchedule, allowing us to avoid this check when getting heuristics for new dynamic shapes. As mentioned in https://github.com/NVIDIA/Fuser/issues/3419#issuecomment-2479956772 this gives us a sizeable speedup in most cases.

Before this PR:

After this PR:

~~This is related to #3419, but until we address the many-segments latency I will refrain from closing that issue.~~ EDIT: the steady host latency for many-segments is 340 us, so getting dynamic latency down to 1400 us makes it about 3x steady. This matches the other two tests: many pointwise ops (steady=43 us, dynamic=135 us) and adaptive layernorm (steady=71 us, dynamic=222 us). So in general we now have dynamic latency of about 3x steady latency.

Fixes #3419

jacobhinkle commented 3 days ago

!test

jacobhinkle commented 2 days ago

H100 CI runners must have failed or something. This change is not arch-specific though and all other tests are passing.

jacobhinkle commented 2 days ago

!build

jacobhinkle commented 1 day ago

!build

NVIDIA / Fuser

Avoid canScheduleCompileTime for dynamic shape check #3436