GPUOpen-Drivers / llpc

LLVM-Based Pipeline Compiler
MIT License
166 stars 115 forks source link

lgc: Move handling of GroupMemcpy for mesh/task shaders to MeshTaskShader #2814

Closed xazhangAMD closed 9 months ago

amdvlk-admin commented 10 months ago

Test summary for commit e5ba891624d85ed3ccbb33ae470b13f9239c6c9e

CTS tests (Failed: 0/138362)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35163/69147 (50.9%)
    • Failed: 0/69147 (0.0%)
    • Not Supported: 33984/69147 (49.1%)
    • Warnings: 0/69147 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)
amdvlk-admin commented 10 months ago

Test summary for commit e64fcfe31555d10de2863a3172c6b763bd67cbab

CTS tests (Failed: 0/138443)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35211/69228 (50.9%)
    • Failed: 0/69228 (0.0%)
    • Not Supported: 34017/69228 (49.1%)
    • Warnings: 0/69228 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35241/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33974/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)
nhaehnle commented 10 months ago

I didn't see the other discussion early enough. Is there really no better way of doing this? For example, if you do move this into its own pass, could use a generic implementation that uses lgc.input.import.builtin or some other pre-existing ops?

I don't particularly fancy a further proliferation of passes, but I think a number of very similar tasks of "lowering operations" like this could be combined into one pass. Though this doesn't necessarily have to happen in this PR.

xazhangAMD commented 10 months ago

I didn't see the other discussion early enough. Is there really no better way of doing this? For example, if you do move this into its own pass, could use a generic implementation that uses lgc.input.import.builtin or some other pre-existing ops?

I don't particularly fancy a further proliferation of passes, but I think a number of very similar tasks of "lowering operations" like this could be combined into one pass. Though this doesn't necessarily have to happen in this PR.

Rex is going to refactor this for the CS part. My current change is just to make it a little easier for mesh/tash shaders. BTW, I need another PR first because that solves several random mesh/task shader test failures. 2812

amdvlk-admin commented 10 months ago

Test summary for commit 98f6e63d43bf6280a32313f34543b6a55d378ba8

CTS tests (Failed: 0/138362)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35163/69147 (50.9%)
    • Failed: 0/69147 (0.0%)
    • Not Supported: 33984/69147 (49.1%)
    • Warnings: 0/69147 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)
amdrexu commented 10 months ago

Rex is going to refactor this for the CS part.

Yes, I will revisit this.

I don't particularly fancy a further proliferation of passes, but I think a number of very similar tasks of "lowering operations" like this could be combined into one pass.

The pass PatchInitializeWorkgroupMemory does similar thing but is not very generic. I will think about it and do some refactoring work. Currently, task/mesh lowering of memory copy requested by Xavi is added to MeshTaskShader only CS need to be handled particularly.

nhaehnle commented 10 months ago

So just to be clear, do we wait for the revised version on this?

xazhangAMD commented 10 months ago

The fix for random thread IDs has been merged so I have a working version for LLPCFE. This PR can be reviewed as is now or I can also wait for Rex's refactor, too.

xazhangAMD commented 9 months ago

Close this PR for now