ROCm / triton

Development repository for the Triton language and compiler
MIT License
83 stars 27 forks source link

[Triton] [PyTorch UT] `tl.reshape` cherry-pick support #454

Closed jataylo closed 7 months ago

jataylo commented 7 months ago

This PR https://github.com/pytorch/pytorch/pull/116079#issuecomment-1885206171 brought in a failure for ROCm with some new UT's that rely on tl.reshape producing the following output:

24-01-09T22:04:00.6930835Z torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-01-09T22:04:00.6931364Z CompilationError: at 24:171:        rindex = roffset + rbase
2024-01-09T22:04:00.6931749Z         rmask = rindex < rnumel
2024-01-09T22:04:00.6932088Z         r2 = rindex
2024-01-09T22:04:00.6932620Z         tmp0 = tl.load(in_ptr0 + (x1 + (512*x0) + (262144*r2)), rmask, eviction_policy='evict_last', other=0.0)
2024-01-09T22:04:00.6933397Z         tmp1 = tl.load(block_ptr0, boundary_check=[1], padding_option='zero', eviction_policy='evict_first')
2024-01-09T22:04:00.6933983Z         block_ptr0 = tl.advance(block_ptr0, [0, RBLOCK])
2024-01-09T22:04:00.6934392Z         tmp2 = tmp0 * tmp1
2024-01-09T22:04:00.6934715Z         tmp3 = tl.broadcast_to(tmp2, [XBLOCK, RBLOCK])
2024-01-09T22:04:00.6935081Z         tmp5 = _tmp4 + tmp3
2024-01-09T22:04:00.6935387Z         _tmp4 = tl.where(rmask, tmp5, _tmp4)
2024-01-09T22:04:00.6935712Z     tmp4 = tl.sum(_tmp4, 1)[:, None]
2024-01-09T22:04:00.6936586Z     tl.store(tl.make_block_ptr(out_ptr0, shape=[262144], strides=[1], block_shape=[XBLOCK], order=[0], offsets=[xoffset]), tl.reshape(tl.broadcast_to(tmp4, [XBLOCK, 1]), [XBLOCK]).to(tl.float32), boundary_check=[])
2024-01-09T22:04:00.6937566Z                                                                                                                                                                            ^
2024-01-09T22:04:00.6938565Z ValueError('`reshape` is not supported yet. Please use `view` instead if applicable. Note that view may reorder elements in an implementation- and context- dependent way.')

Based on @pragupta 's work we may be able drop in replace tl.reshape with tl.view but sounds like this comes with caveats:

Note that view may reorder elements in an implementation- and context- dependent way

jayfurmanek commented 7 months ago

@zhanglx13 to look at cherry-picking

zhanglx13 commented 7 months ago

We'll port this op during phase 2 of AMD backend migration.