etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
http://www.itkp.uni-bonn.de/~urbach/software.html
GNU General Public License v3.0
32 stars 47 forks source link

QPhiX::half is an unsigned short... #387

Open kostrzewa opened 7 years ago

kostrzewa commented 7 years ago

@martin-ueding

As a result, https://github.com/kostrzewa/tmLQCD/blob/3285769b5b279a31d15265d0bd07ac1e8485ca5e/qphix_base_classes.hpp#L45 is unlikely to work correctly...

In fact, the fourth argument in https://github.com/kostrzewa/tmLQCD/blob/3285769b5b279a31d15265d0bd07ac1e8485ca5e/qphix_base_classes.hpp#L129 is converted to int because of the sign...

Similarly, our packers probably fail (silently) because of this and we might need to use rep<FT,double> for this.

kostrzewa commented 7 years ago

In practice one can simply disable half precision.

martin-ueding commented 7 years ago

I have realized the complication of the half type only a week ago, therefore one cannot just have scalar values as FT because the conversion is not the one that we would expect.

It has just struck me what we might be able to do instead: C++ allows for implicit conversion operators. So I would throw in the following to the discussion: An actually new type that is not just a typedef. This means that implicit conversions from double to Half are not possible without going through the constructor that we write. Then all the intermediate functions on the QPhiX side would use this Half class, the kernels still take half. Arrays will not be converted automatically, so that is an issue, but double * is not converted into float * either. This is a complete example that compiles and works:

#include <iostream>

typedef unsigned short half;

// Mock of the `rep` function to see whether it works.
template <typename A, typename B>
A rep(B b) {
    return b + 10;
}

class Half {
  public:
    // Implicit constructors.
    Half(half const x) : x_(x) {}
    Half(float const x) : x_(rep<half, float>(x)) {}
    Half(double const x) : x_(rep<half, double>(x)) {}

    // Conversion operator that allows using the `Half` class in places where
    // `half` is required.
    operator half() { return x_; }

  private:
    half x_;
};

void func(half x) {
    std::cout << x << std::endl;
}

void intermediate(Half x) {
    func(x);
}

int main() {
    double a = 1.0;

    intermediate(a); // Program will print 11 here.
}

The double is converted to half via the Half class implicitly in the chain of function calls. The bad thing is that the built-in conversion (which is not what we want) will also happen implicitly without an error. One should just have not abused the unsigned short in the code, it is bad enough that the Intel intrinsics use that type.

And while we are at it, the sizeof(unsigned short) is probably not guaranteed to be sizeof(float) / 2, right? So it would be better to use uint16_t in order to get the correct number of bits.

This class Half would require to change pretty much all points in QPhiX where numbers are passed. Perhaps just passing FT = Half as the template parameter will make it magically happen as we want. Probably it is just easier to use rep<> in the locations where needed and have test cases to check that one did not miss anything.