NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Fix stride-order based allocation domain computation when output has reduction axis #3445

Open Priya2698 opened 5 days ago

Priya2698 commented 5 days ago

fd.add_output(out_tv, stride_order) allows us to set a stride order for the output through the fusion definition. The current setup errors out if out_tv has any reduction axis. This PR:

  1. Accounts for presence of reduction axis, and keeps their position in the allocation domain same as in logical domain.
  2. ~Sets contiguity to false if the stride_order is not trivial.~
Priya2698 commented 5 days ago

!test

Priya2698 commented 5 days ago

!test

Priya2698 commented 2 days ago

!test