NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
260 stars 51 forks source link

RoPE-like slice-concat patterns may result in cyclic exact graphs #3072

Open naoyam opened 2 weeks ago

naoyam commented 2 weeks ago

Example (same as #3068):

t0: [i0]
t1 = t0 // [i1]
t2 = t1 // [i2]
t3 = t2[0:2] // [i3]
t4 = pad(t23 {0, i0 - 2}) // [i4]
t5 = t1 + t4 // [i5]

Image

The exact graph groups i0, i1, i2, i4 and i5 together as they are just used together with the pointwise ops. As you can see in the above diagram, this results in a cycle between the two groups.

Being acyclic is one of the basic assumptions we have in the loop promotion analysis. For example, ValGraphStmtSort won't work with cyclic graphs, which means, for example, computeCoveredGroups wont't work since it relies on ValGraphStmtSort. The propagation of promotion info through the IEL graph (intersection of exact and loop graphs) is also done with ValGraphStmtSort, so that may not work either, although IEL might not necessarily be cyclic even when the exact graph is cyclic.

jacobhinkle commented 2 weeks ago

Can't you also have cycles in the ValGraph with reshape?

tv0 // [ iS0{12} ]
tv1 = reshape(tv0, {4, 3})  // [ iS1{4}, iS2{3} ]
tv2 = reshape(tv1, {12})  // [ iS3{12} ]
tv3 = add(tv0, tv2)  // [ iS4{12} ]

IDs {0, 3, 4} are exact mapped, and there's a split from that to {1} and {2} and a merge back.

naoyam commented 2 weeks ago

That is true too. It's just we haven't been hit by actual use cases (yet...).