NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Print all outputs in `SdpaFwdOp` #3328

Closed Priya2698 closed 2 weeks ago

Priya2698 commented 2 weeks ago

Closes Issue #3037.

For the test: https://github.com/NVIDIA/Fuser/blob/f08bd5182df894019cfd07f04f7b2125750af713/tests/cpp/test_sdpa_node.cpp#L691

SdpaFwdOp::toString()

T3_l___half[ iS12{16}, iS13{32}, iS14{64}, iS15{64} ],
T4_l_float[ iS16{16}, iS17{32}, iS18{64} ],
T5_l_int64_t[ ],
T6_l_int64_t[ ]
   = sdpa(T0_g___half[ iS0{16}, iS1{32}, iS2{64}, iS3{64} ],
            T1_g___half[ iS4{16}, iS5{32}, iS6{128}, iS7{64} ],
            T2_g___half[ iS8{16}, iS9{32}, iS10{128}, iS11{64} ],
            dropout_p = double(0),
            is_causal = false  )

SdpaBwdOp::toString()

T8_g___half[ iS23{16}, iS24{32}, iS25{64}, iS26{64} ],
T9_g___half[ iS27{16}, iS28{32}, iS29{128}, iS30{64} ],
T10_g___half[ iS31{16}, iS32{32}, iS33{128}, iS34{64} ]
   = sdpa_bwd(T7_g___half[ iS19{16}, iS20{32}, iS21{64}, iS22{64} ],
            T0_g___half[ iS0{16}, iS1{32}, iS2{64}, iS3{64} ],
            T1_g___half[ iS4{16}, iS5{32}, iS6{128}, iS7{64} ],
            T2_g___half[ iS8{16}, iS9{32}, iS10{128}, iS11{64} ],
            T3_g___half[ iS12{16}, iS13{32}, iS14{64}, iS15{64} ],
            logsum_exp = T4_l_float[ iS16{16}, iS17{32}, iS18{64} ],
            dropout_p = double(0),
            is_causal = false,
            philox_seed = T5_l_int64_t[ ],
            philox_offset = T6_l_int64_t[ ],
  )
Priya2698 commented 2 weeks ago

!build

wujingyue commented 2 weeks ago

!test