Don't redundantly remove duplicates

This PR removes a few pessimizations from Tensile. In particular, there is a call to fromkeys on a kernels list to deduplicate the list, but this work is already done on the input to the function. This operation is expensive because it requires hashing a solution which requires computing the solution name and computing solution names is a bottleneck. Further, a existence operation was cleaned up from:

key in list(mydict.keys())

key in mydict

which has a small but measurable impact on runtime. Lastly, the Solution.getKernels() function return value was changed from returning a single element list (presumably to support operator + for list concat) to return the kernel in favor of list.append().

ROCm / Tensile

Don't redundantly remove duplicates #1981