This PR removes a few pessimizations from Tensile. In particular, there is a call to fromkeys on a kernels list to deduplicate the list, but this work is already done on the input to the function. This operation is expensive because it requires hashing a solution which requires computing the solution name and computing solution names is a bottleneck. Further, a existence operation was cleaned up from:
key in list(mydict.keys())
to
key in mydict
which has a small but measurable impact on runtime. Lastly, the Solution.getKernels() function return value was changed from returning a single element list (presumably to support operator + for list concat) to return the kernel in favor of list.append().
This PR removes a few pessimizations from Tensile. In particular, there is a call to fromkeys on a kernels list to deduplicate the list, but this work is already done on the input to the function. This operation is expensive because it requires hashing a solution which requires computing the solution name and computing solution names is a bottleneck. Further, a existence operation was cleaned up from:
to
which has a small but measurable impact on runtime. Lastly, the
Solution.getKernels()
function return value was changed from returning a single element list (presumably to support operator+
for list concat) to return the kernel in favor oflist.append()
.