ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

Fix LLVM crash issue #1840

Closed AlexBrownAMD closed 10 months ago

AlexBrownAMD commented 10 months ago

LLVM crash was caused by make trying to run the build step on embed files before they were finished generating. Embed files can be very large CPP files and may take more than a couple seconds to write to disk. The make target has a dependency on the cpp file, but it parallel builds it seems to activate anytime it notices the file is present, not necessarily when python finishes writing it and closes the handle.

To work around the issue, the embed file is now written to a .temp file which is renamed to .cpp when it is finished and closed. This prevents make from starting the build step early.

There should be a way to accomplish this with a custom cmake target+dependency instead, but my experiments with setting that up caused other build issues like missing build files.

This PR also reverts the recent changes to reduce build threads and increase test timeout to see if this change is enough to fix the build issues for CI.