ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

DirectToVgpr + packing support, increase extended test timeout #1872

Closed nakajee closed 9 months ago

nakajee commented 9 months ago
nakajee commented 9 months ago

I encountered a mismatch issue with small MT + DTV + PGR2. It was caused by incorrect instruction scheduling between DTV load and packing code. I made a change to generate all packing code before generating DTV load to avoid vreg data overwritten by the next DTV load.

nakajee commented 9 months ago

Extended test keeps failing with timeout... I added test cases, but not very much. I might need to extend timeout.

nakajee commented 9 months ago

Increased extended test timeout from 600 to 720 min. I hope extended tests pass this time.

nakajee commented 9 months ago

Looks like we're hitting timeouts on extended tests with the new tests added after the last few changes. We probably need to trim down extended test coverage to fit time limit. Are all tests passing locally?

I ran precheckin+ extended on gfx942 and all passed. Let me extend the timeout for now. I saw 908 timeout even before my change. The new cases I added has nothing to do with 908 (all skiped), but we still see timeout.

I think it is better to trim down some test cases, but probably not in this PR.

nakajee commented 9 months ago

Anyway, I am running precheckin+extended on 90a and 942 again now.

nakajee commented 9 months ago

Anyway, I am running precheckin+extended on 90a and 942 again now.

Finally, extended CI test passed on both 908 and 90a. 908 extended test took around 11 hours.

Also, I confirmed that precheckin+extended test passed on 90a and 942 on my side.

AlexBrownAMD commented 9 months ago

908 extended test took around 11 hours

We should figure out some tests to trim there, that's a pretty long run time.

TorreZuk commented 9 months ago

@AlexBrownAMD You may want to consider a "weekly" job like rocBLAS, move all long tests that check obscure features. rocblas can add the label to any PR where we want to check these obscure things more often than weekly.... discuss with Eiden.