GPU compliant Tensor operation classes

dmccloskey commented 5 years ago

Description

A number of implementations that work fine when using CPU code break when moving to GPU code as implemented in Cuda.

Objectives:

[x] activation function wrapper for ActivationOps
[ ] re-implement a Cuda-compatible version of clip or remove call to clip from all ActivationOps
[x] remove standard library functions from ActivationOps (i.e., log, exp, pow)
[x] update IntegrationOp methods to not need a constant initialization
[x] re-implement Solver (Solver3) with typename for DeviceT broke SGDGradientNoise and GradientClipping
[x] re-implement LossFunction (LossFunction3) with typename for DeviceT
[ ] re-implement IntegrationFunction (IntegrationFunction3) with typename for DeviceT partially finished
[ ] getParameters test for activation, integration, solver, and loss function classes

Validation

[x] passing tests for ActivationOps
[x] Tests for ActivationOpWrapper
[ ] passing tests for IntegrationOp3
[x] passing tests for SolverOp3
[ ] passing tests for LossFunction3

dmccloskey commented 5 years ago

See custom implementation of clip operator https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/eigen_activations.h

/**
  * \ingroup CXX11_NeuralNetworks_Module
  * \brief Template functor to clip the the magnitude of the first scalar.
  *
  * \sa class CwiseBinaryOp, MatrixBase::Clip
  */
template <typename Scalar>
struct scalar_clip_op {
  EIGEN_EMPTY_STRUCT_CTOR(scalar_clip_op)
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar
  operator()(const Scalar& a, const Scalar& b) const {
    return numext::mini(numext::maxi(a, -b), b);
  }
  template <typename Packet>
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet
  packetOp(const Packet& a, const Packet& b) const {
    return internal::pmin(internal::pmax(a, internal::pnegate(b)), b);
  }
};

namespace internal {
template <typename Scalar>
struct functor_traits<scalar_clip_op<Scalar> > {
  enum {
    Cost = NumTraits<Scalar>::AddCost * 3,
    PacketAccess = packet_traits<Scalar>::HasMax &&
                   packet_traits<Scalar>::HasMin &&
                   packet_traits<Scalar>::HasNegate
  };
};
}  // namespace internal

}  // end namespace Eigen

#endif  // EIGEN_CXX11_NEURAL_NETWORKS_ACTIVATIONS_H

dmccloskey commented 5 years ago

See spatial convolution example https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/eigen_spatial_convolutions.h and its implementation https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_2d.h

pad, inflate, and shuffle:

template <typename Device, typename T, int Dims, typename IndexType>
struct InflatePadAndShuffle {
  void operator()(
      const Device& d, typename TTypes<T, Dims, IndexType>::ConstTensor input,
      const Eigen::DSizes<IndexType, Dims>& strides,
      const Eigen::array<Eigen::IndexPair<IndexType>, Dims>& pad_dims,
      const Eigen::DSizes<IndexType, Dims>& order,
      typename TTypes<T, Dims, IndexType>::Tensor output) {
    output.device(d) = input.inflate(strides).pad(pad_dims).shuffle(order);
  }
};

Spatial convolution call

template <typename Device, typename Input, typename Filter, typename Output>
void SpatialConvolutionFunc(const Device& d, Output output, Input input,
                            Filter filter, int row_stride, int col_stride,
                            int row_dilation, int col_dilation,
                            const Eigen::PaddingType& padding) {
  // Need to swap row/col when calling Eigen.
  output.device(d) =
      Eigen::SpatialConvolution(input, filter, col_stride, row_stride, padding,
                                col_dilation, row_dilation);
}

dmccloskey commented 5 years ago

See softmax example https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/eigen_softmax.h

dmccloskey commented 5 years ago

See pooling example https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/eigen_pooling.h

dmccloskey commented 5 years ago

Alternative Clipping using built-in .max and .min calls: https://forum.kde.org/viewtopic.php?f=74&t=117349

dmccloskey / EvoNet

GPU compliant Tensor operation classes #79

Description

Objectives:

Validation