ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
213 stars 145 forks source link

Remove deepcopy in TensileCreateLibrary #1973

Closed ellosel closed 1 month ago

ellosel commented 2 months ago

The deepcopy function is used throughout Tensile and one of the primary bottlenecks. This PR removes the deep copies of Solutions - namely those that occur in TensileCreateLibrary. This significantly improves the turnaround time of TensileCreateLibrary by up to 40% depending on the input to the main.

Ideally, we could make the Solution class immutable as a part of these changes (which we attempted) but the solution changes in ways during runtime that would require a major refactor.

To ensure that the program still behaves correctly, we ran the following tests:

nakajee commented 1 month ago

How did you verify this works same as before?

ellosel commented 1 month ago

How did you verify this works same as before?