Work out why 32-bit digits is slower than 64-bit digits with same fixnum size

For example:

Function: mul_lo, #elts: 600e3
fixnum digit  total data   time       Kops/s
 bits  bits     (MiB)    (seconds)
[...]
   64    32       4.6     0.000     6122449.0
  128    32       9.2     0.000     3592814.4
  256    32      18.3     0.000     1583113.5
  512    32      36.6     0.001      493827.2
 1024    32      73.2     0.004      136549.8

   64    64       4.6     0.000     7792207.8
  128    64       9.2     0.000     4918032.8
  256    64      18.3     0.000     2135231.3
  512    64      36.6     0.001      657894.7
 1024    64      73.2     0.003      181928.4
[...]

The total data is different, so it's possible that this is a measurement mistake rather than a code generation problem.

data61 / cuda-fixnum

Work out why 32-bit digits is slower than 64-bit digits with same fixnum size #48