Closed glistering96 closed 1 year ago
def run_test(dtype): a = np.random.rand(1024, 1024).astype(dtype) b = np.random.rand(1024, 1024).astype(dtype)
return a.dot(b)
%%timeit run_test(np.float32) 34.3 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit run_test(np.float16) 6.51 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So 16bit np env is not favorable
Fixed the 16bit numpy part to 32bit leaving the torch 16bit part remained.
Performance increase is observed
def run_test(dtype): a = np.random.rand(1024, 1024).astype(dtype) b = np.random.rand(1024, 1024).astype(dtype)
%%timeit run_test(np.float32) 34.3 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit run_test(np.float16) 6.51 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So 16bit np env is not favorable