NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Remove MatmulParams::rotate_ldmatrix_out_of_main_loop #3337

Closed jacobhinkle closed 2 weeks ago

jacobhinkle commented 2 weeks ago

I can't find any commit in which this option was ever actually used. This is the commit where the option was originally introduced: https://github.com/csarofeen/pytorch/pull/2488/files#diff-e7a5a84a2cfeddeb15669f07105bdb3722a796600ea9e1f2eb25afb29283457eR22 We've gone this long without the ability to disable loop rotation, so either we should change the condition in the schedulers to respect it, or just remove it.

jacobhinkle commented 2 weeks ago

!test

jacobhinkle commented 2 weeks ago

IIUC, we always rotate the main loop if circular buffering is enabled. Or is that separate from ldmatrix?

Yeah, I think the idea was that we'd rotate the main loop when circular buffering as you said, but probably for extra flexibility the option was introduced. As an alternative to this PR, we could just as easily start respecting this option and add the condition to the heuristic. That would let us measure the effect of loop rotation.

rdspring1 commented 2 weeks ago

What is the benefit of loop rotation?

I'm not sure if ldmatrix is high priority for hopper, so it is mostly an ampere feature. I wouldn't mind having fewer flags, so removing it is fine by me.