Closed oplavsic closed 3 months ago
This PR introduces AMD specific scheduling pass. Main purpose it has for now is to hoist Q tensor out of the loop in FA fwd pass, and to schedule instructions produced by dot slicing pass.
This PR introduces AMD specific scheduling pass. Main purpose it has for now is to hoist Q tensor out of the loop in FA fwd pass, and to schedule instructions produced by dot slicing pass.