buildaworldnet / IrrlichtBAW

Build A World fork of Irrlicht
http://www.buildaworld.net
Apache License 2.0
122 stars 28 forks source link

Reduce code duplication in SIMD classes #363

Closed devshgraphicsprogramming closed 1 year ago

devshgraphicsprogramming commented 5 years ago

See #209 as well

The idea is to move a lot of stuff from vectorSIMD.h to many headers (and finally have them sitting in include/irr/core/math)

Allowed tech: SSE4.2 and NEON.

Changes to do:

Loose spec:

  1. Align (alignas) vector class to nearest power of two
  2. Don't use AVX yet
  3. Try to avoid SSE/NEON intrinsics proliferation in code

Ideas for implementation:

template<typename T, size_t dimension, typename CRTP>
class IRR_FORCE_EBO vectorBase
{
   protected:
      vectorBase() {} // not meant to be instantiated directly
   public:
      template<typename INT_TYPE>
      inline const T& operator[](INT_TYPE ix) const
      {
#ifdef _IRR_DEBUG
         assert(ix<dimension);
#endif
         return reinterpret_cast<T*>(this)[ix];
      }

      // then all the common operators for T={bool,int,float} in non-SIMD plain C++
};

Note that the base class is "zero sized" and depends on a CRTP parameter.

We will need a basic bit-field class like we have right now for expressing non-int things such as &,^,~,|, etc. but no bitshifts (that's sign value dependent)

#include <type_traits>
#include "irr/core/math/vectorBase.h"

template<typename T, size_t dimension, typename CRTP>
class IRR_FORCE_EBO vectorBitfieldBase : public vectorBase<T,dimension,vectorBitfieldBase<T,dimension,CRTP>>
{
      static_assert(std::is_integer<T>::type,"T should be an int type!");
   protected:
      vectorBitfieldBase() {} // not meant to be instantiated directly

   public:
     // operators that only make sense for bitfields just like here https://github.com/buildaworldnet/IrrlichtBAW/blob/master/include/vectorSIMD.h#L46
};

Then certain T-specific operators and member functions could be implemented in derivations, like T=some integer type or T=some floating point type

#include "irr/core/math/vectorBitfieldBase.h"

template<typename T, size_t dimension, typename CRTP>
class IRR_FORCE_EBO vectorIntBase : public vectorBitfieldBase<T,dimension,vectorIntBase<T,dimension,CRTP>>
{
   protected:
      vectorIntBase() {} // not meant to be instantiated directly

   public:
     // operators that only make sense for ints
};

This could get even more specific for example for T=unsigned vsT=signed, or T=float vs T=double

#include "irr/core/math/vectorIntBase.h"

template<typename T, size_t dimension, typename CRTP>
class IRR_FORCE_EBO vectorUnsignedBase : public vectorIntBase<T,dimension,vectorUnsignedBase<T,dimension,CRTP>>
{
      static_assert(std::is_unsigned<T>::type,"T should be an unsigned int type!");
   protected:
      vectorUnsignedBase() {} // not meant to be instantiated directly

   public:
     // operators that have to be implemented differently for unsigned and signed (right bitshift operators)
};

For ensuring the vector's choice of dimension and type can be represented another class can be used.

#include <type_traits>
#include "irr/core/AlignedBase.h"

template<typename T, size_t dimension>
class IRR_FORCE_EBO vectorHWAccelBase : public AlignedBase<core::roundUpToPoT(dimension*sizeof(T))>
{
      static_assert(
         dimension*sizeof(T) == 8u || // NEON DWORD
         dimension*sizeof(T) == 16u || // NEON QWORD / SSE4.2
         dimension*sizeof(T) == 32u || // AVX
         dimension*sizeof(T) == 64u ,  // AVX512
         "No hardware acceleration at all possible"
      );
   protected:
      vectorHWAccelBase() {} // not meant to be instantiated directly
};

You will have to provide a constexpr template specialization of findMSB.

Finally specific "instantiable" vectors could be forward declared

template<typename T, size_t dimension, typename CRTP>
class gvec;

and the methods (but not the members) implemented like this with partial-template-specialization (note: in this example "single float" means single precision float, as opposed to a double)

// split the methods
template<typename CRTP>
class IRR_FORCE_EBO gvec<float,2u,CRTP> : public vectorHWAccelBase<float,2u>, public vectorSingleFloatBase<2u,gvec<float,2u,CRTP> >
{
   public:
#ifdef __IRR_COMPILE_WITH_ARM_SIMD_
     // reimplement all the operators of `vectorSingleFloatBase` with NEON intrinsics
#endif
    // can't use MMX so no x86 SIMD implementation for types with dimension < 4
};

Doing it this way saves a lot of duplicate code, since members are declared separately, we don't need a separate class and union declaration for every T, just for every dimension count.

template<typename T>
class gvec2 final : public gvec<T,2u,gvec2<T> >
{
   public:
      union
        {
            struct{
                T X; T Y;
            };
            struct{
                T x; T y;
            };
            struct{
                T r; T g;
            };
            struct{
                T s; T t;
            };
            T pointer[2u];
      };
};

typedef gvec2<float> vec2;

typedef gvec2<double> dvec2;

typedef gvec2<int32_t> ivec2;

typedef gvec2<unt32_t> uvec2;

The upside of implementing it all like this, is that the SSE4.2 or NEON intrinsics code does not need to be complete for every function/operator and can get filled in (implemented) over time.

All the GLSL functions like fract, mix, etc. can be implemented as templates, i.e.

template<class VECTOR_TYPE, typename INTERPOLANT_TYPE>
VECTOR_TYPE mix(const VECTOR_TYPE& x, const VECTOR_TYPE& y, const INTERPOLANT_TYPE& a); 

With explicit specializations as definitions later on that contain as little SSE/NEON as possible.

The vec3 type (non power of two dimensions) are actually going to be defined as restricted versions of their next power-of-two sized vector (for alignment reasons). So in GLSL with std140 packing a vec3 is actually a vec4 but it just has a different type.

template<typename T>
class gvec3 final : public gvec<T,4u,gvec3<T> >
{
   public:
      union
        {
            struct{
                T X; T Y; T Z;
            };
            struct{
                T x; T y; T z;
            };
            struct{
                T r; T g; T b;
            };
            struct{
                T s; T t; T p;
            };
            T pointer[3u];
      };
};

typedef gvec3<float> vec3;

typedef gvec3<double> dvec3;

typedef gvec3<int32_t> ivec3;

typedef gvec3<unt32_t> uvec3;

Lastly the {i,u,d}vec[5,6,7,8] vectors should just be defined as

template<typename T, size_t dimension>
class gvecN final : public gvec<T,core::roundUpToPoT(dimension),gvecN<T,dimension>>
{
      static_assert(dimension>4u,"This specialization is only for very large types");
   public:
      union
        {
            struct{
                T X; T Y; T Z; T W;
            };
            struct{
                T x; T y; T z; T w;
            };
            struct{
                T r; T g; T b; T a;
            };
            struct{
                T s; T t; T p; T q;
            };
            T pointer[dimension];
      };
};

typedef gvecN<float,5u> vec5;
typedef gvecN<float,6u> vec6;
typedef gvecN<float,7u> vec7;

typedef gvecN<double,5u> dvec5;
typedef gvecN<double,6u> dvec6;
typedef gvecN<double,7u> dvec7;

typedef gvecN<int32_t,5u> ivec5;
typedef gvecN<int32_t,6u> ivec6;
typedef gvecN<int32_t,7u> ivec7;

typedef gvecN<unt32_t,5u> uvec5;
typedef gvecN<unt32_t,6u> uvec6;
typedef gvecN<unt32_t,7u> uvec7;
devshgraphicsprogramming commented 5 years ago

You can use the following typedefs for quick testing the engine without replacing occurences in every file

#define _IRR_LEGACY_VEC_TYPE_
#ifdef _IRR_LEGACY_VEC_TYPE_
        typedef uvec4 vectorSIMDu32;
    typedef ivec4 vectorSIMDi32;

    //! Typedef for an integer 3d vector.
    typedef vectorSIMDu32 vector4du32_SIMD;
    typedef vectorSIMDu32 vector3du32_SIMD;
    typedef vectorSIMDu32 vector2du32_SIMD;

    typedef vectorSIMDi32 vector4di32_SIMD;
    typedef vectorSIMDi32 vector3di32_SIMD;
    typedef vectorSIMDi32 vector2di32_SIMD;

    typedef vec4 vector4df_SIMD;
    typedef vec4 vector3df_SIMD;
    typedef vec4 vector2df_SIMD;
#endif

In a later phase of the PR you can use (but not yet cause I fear that some code may assume that vec2=vec3=vec4)

#define _IRR_LEGACY_VEC_TYPE_
#ifdef _IRR_LEGACY_VEC_TYPE_
    //! Typedef for an integer 3d vector.
    typedef uvec4 vector4du32_SIMD;
    typedef uvec3 vector3du32_SIMD;
    typedef uvec2 vector2du32_SIMD;

    typedef ivec4 vector4di32_SIMD;
    typedef ivec3 vector3di32_SIMD;
    typedef ivec2 vector2di32_SIMD;

    typedef vec4 vector4df_SIMD;
    typedef vec3 vector3df_SIMD;
    typedef vec2 vector2df_SIMD;
#endif
devshgraphicsprogramming commented 5 years ago

Lib MIME has public domain implementations of logarithmic and trigonometric functions (if we want parity with glsl)

devshgraphicsprogramming commented 2 years ago

best approach https://t0rakka.silvrback.com/simd-scalar-accessor

devshgraphicsprogramming commented 1 year ago

going to use https://github.com/redorav/hlslpp

Then the following for missing free functions (transcendentals) etc: