Set the default block size of nw to 64

This yields faster performance on almost all datasets and GPUs I have access to, at the cost of using a bit more memory:

$ ~/src/futhark/tools/cmp-bench-json.py gpu04-{32,64}.json

nw.fut
  data/tiny.in:                                                         0.96x
  data/large.in:                                                        1.56x (mem: 1.01x@device)
  data/small.in:                                                        1.40x (mem: 1.11x@device)
  data/medium.in:                                                       1.42x (mem: 1.02x@device)

$ ~/src/futhark/tools/cmp-bench-json.py gpu03-{32,64}.json

nw.fut
  data/tiny.in:                                                         1.06x
  data/large.in:                                                        1.13x (mem: 1.01x@device)
  data/small.in:                                                        1.10x (mem: 1.11x@device)
  data/medium.in:                                                       1.14x (mem: 1.02x@device)

$ ~/src/futhark/tools/cmp-bench-json.py phi-{32,64}.json

nw.fut
  data/tiny.in:                                                         1.00x
  data/large.in:                                                        1.11x (mem: 1.01x@device)
  data/small.in:                                                        1.05x (mem: 1.11x@device)
  data/medium.in:                                                       1.14x (mem: 1.02x@device)

diku-dk / futhark-benchmarks

Set the default block size of nw to 64 #18