Closed ex-rzr closed 1 year ago
@ex-rzr could you point back to the specific source lines which are being miscompiled and describe the changes you made to avoid the problem. Anything you can do to help us quickly zero in on the the problem without having to try to understand the entire application and rocPRIM library will save some time.
@b-sumner I've created a branch with the workaround disabled and added some instructions. I hope, it's more clear now.
This issue is not fixed in ROCm 2.3.
But good news: I managed to create a small test case without external dependencies: https://gist.github.com/ex-rzr/98e429dad13aa17ba1e67fc7145ba827
hipcc vcc.cpp -o vcc && ./vcc
The last 10 items are a "canary" area, they must be 0 (not modified by the kernel). Like this (MI25, gfx900):
...
995 995
996 996
997 997
998 998
999 999
1000 0
1001 0
1002 0
1003 0
1004 0
1005 0
1006 0
1007 0
1008 0
1009 0
However, on Fiji (gfx803)
...
995 995
996 996
997 997
998 998
999 999
1000 1000
1001 1001
1002 1002
1003 1003
1004 1004
1005 1005
1006 1006
1007 1007
1008 1008
1009 1009
(So it's just a good luck that there is no "Memory access fault").
The kernel looks weird and stupid, but the problem with this issue is that any change in the kernel, types, etc. may lead to correct instructions.
Here are ISAs: vcc.zip
Again, v_add*
modifies vcc
, s_cbranch_vcc*
uses this irrelevant value (v_cmp*
is missing):
v_add_u32_e32 v3, vcc, v3, v0 // 000000001140: 32060103
v_mov_b32_e32 v2, s1 // 000000001144: 7E040201
v_add_u32_e64 v1, s[0:1], s0, v1 // 000000001148: D1190001 00020200
s_and_b64 vcc, exec, vcc // 000000001150: 86EA6A7E
v_addc_u32_e64 v2, s[0:1], 0, v2, s[0:1] // 000000001154: D11C0002 00020480
s_cbranch_vccnz BB0_2 // 00000000115C: BF870007
gfx900 has correct instructions v_cmp_eq_u32_e32 vcc
-> s_cbranch_vccnz
.
v_cmp_eq_u32_e32 vcc, s8, v3 // 0000000011E4: 7D940608
s_and_b64 vcc, exec, vcc // 0000000011E8: 86EA6A7E
v_add_u32_e32 v3, v4, v0 // 0000000011EC: 68060104
s_cbranch_vccnz BB0_2 // 0000000011F0: BF870007
Another example (a test case from rocPRIM):
gfx803:
v_add_u32_e32 v1, vcc, s1, v1 // 00000000111C: 32020201
s_and_b64 vcc, exec, vcc // 000000001120: 86EA6A7E
s_cbranch_vccnz BB0_2 // 000000001124: BF870073
gfx900:
v_cmp_eq_u32_e32 vcc, s8, v1 // 0000000011B8: 7D940208
s_and_b64 vcc, exec, vcc // 0000000011BC: 86EA6A7E
v_add_u32_e32 v1, s2, v2 // 0000000011C0: 68020402
s_cbranch_vccnz BB0_2 // 0000000011C4: BF870069
Thank you for the helpful new standalone code. And let me apologize for the delay on getting to this.
https://github.com/llvm-mirror/llvm/commit/aa2f57df85af2056eafbef94380664a55570fa83 should fix this particular case
@b-sumner, can you comment on when HCC might pickup this fix?
On Jun 4, 2019, at 8:56 PM, Sean Keely notifications@github.com wrote:
@b-sumner https://github.com/b-sumner or @arsenm https://github.com/arsenm: It looks like the compiler fix is in. Can we close this issue?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RadeonOpenCompute/hcc/issues/1024?email_source=notifications&email_token=AABBYY7YL3Y3Y334CSZLANTPY4FK5A5CNFSM4GUK4ZJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW6JCMA#issuecomment-498897200, or mute the thread https://github.com/notifications/unsubscribe-auth/AABBYY75EX4NKQO2VKFGS5TPY4FK5ANCNFSM4GUK4ZJQ.
I don’t have the option of closing it
Currently the fix is only available in HCC's development branch (clang_tot_upgrade), you'll have to build the compiler from source to get it.
This issue is related to https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/52 https://github.com/ROCmSoftwarePlatform/rocPRIM/issues/15
HCC 1.3.18482 (ROCm 2.0)
https://github.com/ROCmSoftwarePlatform/rocPRIM/tree/for_hcc_issue_1024
This branch has the workaround disabled and only one active test for simplicity (https://github.com/ROCmSoftwarePlatform/rocPRIM/commit/e9753c4ca92eb3d8b88a4bdb0e81ce763c86cea5).
https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/for_hcc_issue_1024/rocprim/include/rocprim/device/detail/device_scan_lookback.hpp#L279
Both branches of if statement (
if(flat_block_id == 0)
) can be executed for a thread block with flat_block_id == 0. I checked it by disabling some parts of the code and writing intermediate values to buffers, and indeed, both if-branch and else-branch are executed leading to incorrect memory access.So this "workaround" separates two branches of the if statement. syncthreads and the second if are not required for the algorithm. This change makes the kernel work correctly, however, without any guarantee (I mean it's possible that for other types and scan op, for example, the workaround won't help).
ISA for this test: isa_memory_access_fault.zip
test_hip_device_scan-gfx803-workaround.isa
andtest_hip_device_scan-gfx900-workaround.isa
, correct results:test_hip_device_scan-gfx900.isa
(without workaround), correct results:test_hip_device_scan-gfx803.isa
(without workaround), incorrect results:Here
v4
andv5
areunsinged int flat_block_id
, i.e. the variable in the if.Another kernel with a similar problem:
tabulate-gfx900.isa
, correct results:tabulate-gfx803.isa
, incorrect results:In both cases, for gfx803, the compiler overwrites
vcc
register inv_add
, sov_cmp_* vcc
is completely removed (I suppose it was there just like for gfx900),s_cbranch_vccz
proceeds with the ruined value ofvcc
.