Closed Munksgaard closed 3 years ago
This yields faster performance on almost all datasets and GPUs I have access to, at the cost of using a bit more memory:
$ ~/src/futhark/tools/cmp-bench-json.py gpu04-{32,64}.json nw.fut data/tiny.in: 0.96x data/large.in: 1.56x (mem: 1.01x@device) data/small.in: 1.40x (mem: 1.11x@device) data/medium.in: 1.42x (mem: 1.02x@device) $ ~/src/futhark/tools/cmp-bench-json.py gpu03-{32,64}.json nw.fut data/tiny.in: 1.06x data/large.in: 1.13x (mem: 1.01x@device) data/small.in: 1.10x (mem: 1.11x@device) data/medium.in: 1.14x (mem: 1.02x@device) $ ~/src/futhark/tools/cmp-bench-json.py phi-{32,64}.json nw.fut data/tiny.in: 1.00x data/large.in: 1.11x (mem: 1.01x@device) data/small.in: 1.05x (mem: 1.11x@device) data/medium.in: 1.14x (mem: 1.02x@device)
Well then!
This yields faster performance on almost all datasets and GPUs I have access to, at the cost of using a bit more memory: