ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
213 stars 145 forks source link

Don't redundantly remove duplicates #1981

Closed ellosel closed 1 month ago

ellosel commented 1 month ago

This PR removes a few pessimizations from Tensile. In particular, there is a call to fromkeys on a kernels list to deduplicate the list, but this work is already done on the input to the function. This operation is expensive because it requires hashing a solution which requires computing the solution name and computing solution names is a bottleneck. Further, a existence operation was cleaned up from:

key in list(mydict.keys())

to

key in mydict

which has a small but measurable impact on runtime. Lastly, the Solution.getKernels() function return value was changed from returning a single element list (presumably to support operator + for list concat) to return the kernel in favor of list.append().