ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
213 stars 145 forks source link

Add logic to remove temporary build artifacts #2007

Closed ellosel closed 1 month ago

ellosel commented 1 month ago

Summary:

The Tensile disk usage can reach up to hundreds of GBs in some cases. This work adds new behavior that removes temporary build artifacts and generated code during the build process to keep the disk space high-water mark down. The old behavior can be enabled by using the new option --keep-build-tmp.   Outcomes:

Tensile hard disk usage is reduced by as much as 80%.

Notable changes:

--keep-build-tmp option was added.

Testing and Environment:

Routine CI pipelines and equivalent testing run locally

bstefanuk commented 1 month ago

Tensile hard disk usage is reduced by as much as 80%.

@ellosel how difficult would it be to quantify this in MB/GB for a specific architecture, say, gfx942?

ellosel commented 1 month ago

Tensile hard disk usage is reduced by as much as 80%.

@ellosel how difficult would it be to quantify this in MB/GB for a specific architecture, say, gfx942?

Not too difficult - here are some disk size samples for gfx 900 before (looking at the size of the Tensile project directory including build_tmp and tensile output dir the initial size is 90 MB):

TensileCreateLibrary start
88M  .
92M  .
282M  .
475M  .
661M  .
665M  .
676M  .
688M  .
697M  .
703M  .
767M  .
770M  .
TensileCreateLibrary finished

and after:

TensileCreateLibrary start
90M  .
93M  .
116M  .
140M  .
206M  .
208M  .
211M  .
216M  .
217M  .
220M  .
225M  .
TensileCreateLibrary finished

Which is a 70% reduction if you include the initial 90M. It is closer to an 80% if you don't include the 90M.

Same numbers but for gfx90a before:

119M    .
119M    .
119M    .
119M    .
119M    .
119M    .
119M    .
119M    .
119M    .
543M    .
2.4G    .
3.9G    .
5.7G    .
8.0G    .
11G .
13G .
13G .
15G .

after:

119M  .
185M  .
327M  .
462M  .
603M  .
797M  .
946M  .
1.2G  .
1.2G  .
1.2G  .
2.6G  .
2.7G  .
2.8G  .