gogyzzz / iip_sph_pp

C library for speech pre-processing.
Mozilla Public License 2.0
11 stars 3 forks source link

7/17 이후 진행 가이드(임시로 여기다 만듬) #37

Closed gogyzzz closed 6 years ago

gogyzzz commented 6 years ago

지시 감독인데 방만 운영을 해서 막중한 책임을 느끼고 있습니다 명료한 가이드를 드리겠습니다

일정 (아래에서 위로 읽어주세요)

예상 기간일 뿐이고 실제로 의미는 딱히 없습니다. subrange 연산은 보류하겠습니다. ( 앞으로 subrange 연산이 정말 필요한 상황이 있을지 잘 모르겠음. 본혁이에게 submatmul 해보라고 한 이후 다른 함수들에 대해서도 적용하면 조합이 폭발한다는 생각이 듬... 실제 신호처리 알고리즘을 짤 때 맞닥뜨리면 생각하는 것으로 )

180xxx / 고도화 180xxx / backend 성능 테스트 180xxx / mkl, openblas, cublas backend ( cublas는 사실 08/31 전에 적용됐으면 좋겠지만 일단.. )

180831 / 1차 목표 deadline

180xxx / 예외처리 ( ASSERT는 경고만 하는지 중단도 하는지, 경고만 하는 경우, 중단하는 경우, 등등 ) 180xxx / 기능 테스트 (MATLAB 값과 비교. 값의 일치 기준 정하기) 2 days / 함수 형태, 이름 규칙 정하기 -> 이름통일 -> 문서화, README.md 최신화

7 days / 기능 구현( evd, svd, inverse, diagonal, trace, determinant )

180725 / cmake 지원 by 규래 현재 / 최적화 하지 않은 native C로 모든 함수를 구현(진짜 구현만)

함수 리스트

matrix | name | representation | note | |---|---|---| | alloc_MAT | UINT,UINT,UINT | | | alloc_CMAT | UINT,UINT,UINT | | | zeros | UINT,(UINT,UINT) | | | czeros | UINT,(UINT,UINT) | | | set | MAT*,UINT,(UINT,UINT),DTYPE | | | cset | CMAT*,UINT,(UINT,UINT),DTYPE | | | fill | MAT*, DTYPE | | | cfill | CMAT*, DTYPE, DTYPE | re, im | | get | MAT*,UINT,(UINT,UINT) | | | cget | CMAT*,UINT,UINT,UINT | | | submat | MAT*, MAT*, ITER,ITER, (ITER,ITER, ITER,ITER) | | | csubmat | CMAT*, CMAT*, ITER,ITER, (ITER,ITER, ITER,ITER) | | | free_MAT | MAT* | | | free_CMAT | CMAT* | | | print | MAT* | | | cprint | CMAT* | | | print_sub | MAT*, ITER,ITER, (ITER,ITER, ITER,ITER) | ITER 사용법은 submat 참고 | | cprint_sub | CMAT*, ITER,ITER, (ITER,ITER, ITER,ITER) | | | ele_prod | | C = alpha * A .* B + beta * C (element-wise) | | ele_div | | ele_pow | MAT*, n | C = alpha * pow(A,n) + C | | ele_sqrt | | C = alpha * sqrt(A) + C | | bro_prod | | C = alpha * A ?* B + beta * C (element-wise) | | permute | | [matlab permute, reshape 참고](http://21go.blogspot.com/2011/04/matlab-reshape-permute.html) | | reshape | | | | inverse | C = alpha * A^-1 + beta * C | | | trace | | | | diagonal | | |
wave (iip_wav.h) | name | representation | note | |---|---|---| | read_WAV | char* -> WAV* | | | write_WAV | WAV*, char* | | | WAV2MAT | WAV* -> MAT* | | | MAT2WAV | MAT* -> UINT | | | WAV_BUF2MAT | WAV_BUF* -> MAT* | | | print_WAV | WAV* | | | free_WAV | WAV* | |
binary/text save/load | name | representation | note | |---|---|---| | load_mat | const char*, MAT*, bool | 이 [repo](https://github.com/gogyzzz/complex_double_binary_to_matlab)를 참고 | | save_mat | const char*, MAT*, bool | |
BLAS lv. 1 | name | representation | note | |---|---|---| | axpy | DTYPE, MAT*, MAT* | | | caxpy | DTYPE, CMAT*, CMAT* | | | copy | MAT*, MAT* | | | ccopy | CMAT*, CMAT* | | | asum | MAT*, UINT inc | sum과 asum은 다르지 않았나 | | sum | MAT*, int idx_dimension | | csum | CMAT*, int idx_dimension | | acsum | CMAT*, UINT inc | | | sum | | 이거 왜 없어졌지 | | csum | | | | dot | MAT*, UINT xinc, MAT*, UINT yinc -> DTYPE | 리턴 방식이면 안됨. 결과 값은 스칼라로 나와서는 안됨(차원이 merge된 matrix이어야 함) | | sdot | | | | cdot | | | | udot | | ??? 이거 뭐지 | | nrm2 | | | | rot | OpenBLAS 에는 csrot,zdrot이 없습니다| Plane rotation of points (??) | | scal | | | | swap | | | | iamax | | | | iamin | OpenBLAS 는 idamin,isamin 이 선언만 되어있고, 구현이 안되어있습니다| | | cabs1 | | | | rotm | | 무엇인지 모르겠습니다1 | | rotmg | | 무엇인지 모르겠습니다2 | | rotg | | 무엇인지 모르겠습니다3 | ```c // from kaldi/src struct dimension; void read_mat(const char* filename, MAT*); // dimension must be equal void read_cmat(const char* filename, CMAT*); void write_mat(const char* filename, MAT); void write_cmat(const char* filename, CMAT); #define IIP_ASSERT(condition) // assert function void iip_timer(); // needed void sqrt(MAT* ); void csqrt(CMAT* ); void pow(MAT*, DTYPE n); // exponent can be real value. void cpow(CMAT*, DTYPE n); // void upow(CMAT*, CTYPE n); // void randu(MAT*, DTYPE a, DTYPE b); // uniform distribution void crandu(CMAT*, DTYPE, ar, DTYPE br, DTYPE, ai, DTYPE bi); void randn(MAT*, DTYPE mean, DTYPE var); /// var or std 어느것으로 할 지느ㄴ 다른 라이브러리를 참고할 것 void crandn(CMAT*, CTYPE mean, CTYPE var); // C = AB + C // size(C) = size(A) >= size(B) void mul_elements(MAT* a, MAT* b, MAT* c); // with broadcasting void umul_elements(CMAT* a, CMAT* b, CMAT* c) // with broadcasting void div_elements(MAT* a, MAT* b, MAT* c) // with broadcasting void udiv_elements(CMAT* a, CMAT* b, CMAT* c) // with broadcasting void inv_elements(MAT* ) // (1 / element) void cinv_elements(CMAT* ) // (1 / element) void max(dimension*, MAT*); void cmax(dimension*, CMAT*); void amax(dimension* , MAT*); // absolute max dimension void camax(dimension* , CMAT*); // void min void round(MAT*); void floor void ceil void log(MAT*, UINT base); // base 타입은 알아보고 바꿀 것. void exp(MAT*, DTYPE exponent); void abs(MAT*) void repmat(MAT*, dimension*); // repeat matrix // https://www.mathworks.com/help/matlab/ref/repmat.html void reshape(MAT*, dimension*); void shiftdim(MAT*, SINT n); // shift // https://www.mathworks.com/help/matlab/ref/shiftdim.html void ???(MAT*, UINT dst0, UINT dst1, UINT dst2 ); // 3 x 2 x 1 -> 2 x 3 x 1 void svd // singular value decomposition void evd // eigen value decomposition // 아직 numerical recipe에서 못 찾음. 일단은 blas꺼를 그대로 쓰자. ```
BLAS lv. 2, 3 ### BLAS lv. 2 | name | representation | note | |---|---|---| | gemv | char transA,DTYPE,MAT*,MAT*,DTYPE,MAT* | Y = alpha * A * x + beta * Y | | cgemv | char transA, CTYPE,CMAT*,CMAT*,CTYPE,CMAT* | | ### BLAS lv. 3 | name | representation | note | |---|---|---| | matmul | MAT* A, MAT* B, MAT* C, | A * B -> C | | matmul_range | MAT* A, MAT* B, MAT* C, range* ra, range* rb, range* rc | | | aABpbC | DTYPE alpha, MAT* A, MAT* B, DTYPE beta, MAT* C | | | aABpbC | DTYPE alpha, MAT* A, MAT* B, DTYPE beta, MAT* C, range* ra, range* rb, range* rc | | | aAtBpbC | | | | aABtpbC | | sample | | caABpbC | | | | caAtBpbC | | | | caABtpbC | | | | caABhpbC | | | | caAhBpbC | | | | gemm | char transA, char transB, DTYPE, MAT*,MAT*,DTYPE,MAT* | C = alpha * A * B + beta * C | | cgemm | char, char, DTYPE re, DTYPE im, CMAT*,CMAT*,CTYPE,CMAT* | updated |

reference

kaldi/src/base/ ```c base/io-funcs-inl.h:template void WriteBasicType(std::ostream &os, base/io-funcs-inl.h:template inline void ReadBasicType(std::istream &is, base/io-funcs-inl.h:inline void WriteIntegerPairVector(std::ostream &os, bool binary, base/io-funcs-inl.h:inline void ReadIntegerPairVector(std::istream &is, bool binary, base/io-funcs-inl.h:template inline void WriteIntegerVector(std::ostream &os, bool binary, base/io-funcs-inl.h:template inline void ReadIntegerVector(std::istream &is, base/io-funcs-inl.h:inline void InitKaldiOutputStream(std::ostream &os, bool binary) { base/io-funcs.h:// They were put in util/ in order to avoid making the Matrix library base/io-funcs.h: void WriteIntegerVector(std::ostream &os, bool binary, const std::vector &v); base/io-funcs.h: void ReadIntegerVector(std::istream &is, bool binary, std::vector *v); base/io-funcs.h: type WriteMyTypedefName. This is to avoid introducing confusing templated functions; base/io-funcs.h: void WriteToken(std::ostream &os, bool binary, const char*); base/io-funcs.h: void WriteToken(std::ostream &os, bool binary, const std::string & token); base/io-funcs.h: void ReadToken(std::istream &is, bool binary, std::string *str); base/io-funcs.h: void PeekToken(std::istream &is, bool binary, std::string *str); base/io-funcs.h:template void WriteBasicType(std::ostream &os, bool binary, T t); base/io-funcs.h:template void ReadBasicType(std::istream &is, bool binary, T *t); base/io-funcs.h:void WriteBasicType(std::ostream &os, bool binary, bool b); base/io-funcs.h:void ReadBasicType(std::istream &is, bool binary, bool *b); base/io-funcs.h:void WriteBasicType(std::ostream &os, bool binary, float f); base/io-funcs.h:void WriteBasicType(std::ostream &os, bool binary, double f); base/io-funcs.h:void ReadBasicType(std::istream &is, bool binary, float *f); base/io-funcs.h:void ReadBasicType(std::istream &is, bool binary, double *f); base/io-funcs.h:inline void ReadBasicType(std::istream &is, bool binary, T *t, bool add) { base/io-funcs.h:template inline void WriteIntegerVector(std::ostream &os, bool binary, base/io-funcs.h:template inline void ReadIntegerVector(std::istream &is, bool binary, base/io-funcs.h:inline void WriteIntegerPairVector(std::ostream &os, bool binary, base/io-funcs.h:inline void ReadIntegerPairVector(std::istream &is, bool binary, base/io-funcs.h:void WriteToken(std::ostream &os, bool binary, const char *token); base/io-funcs.h:void WriteToken(std::ostream &os, bool binary, const std::string & token); base/io-funcs.h:void ReadToken(std::istream &is, bool binary, std::string *token); base/io-funcs.h:void ExpectToken(std::istream &is, bool binary, const char *token); base/io-funcs.h:void ExpectToken(std::istream &is, bool binary, const std::string & token); base/io-funcs.h:void ExpectPretty(std::istream &is, bool binary, const char *token); base/io-funcs.h:void ExpectPretty(std::istream &is, bool binary, const std::string & token); base/io-funcs.h:inline void InitKaldiOutputStream(std::ostream &os, bool binary); base/kaldi-error.h:inline void SetVerboseLevel(int32 i) { g_kaldi_verbose_level = i; } base/kaldi-error.h:// Note: we avoid using std::cerr for thread safety issues. base/kaldi-error.h: static void HandleMessage(const LogMessageEnvelope &env, const char *msg); base/kaldi-error.h:void KaldiAssertFailure_(const char *func, const char *file, base/kaldi-error.h:#define KALDI_ASSERT(cond) do { if (cond) (void)0; else \ base/kaldi-error.h:#define KALDI_ASSERT(cond) (void)0 base/kaldi-error.h:#define KALDI_PARANOID_ASSERT(cond) do { if (cond) (void)0; else \ base/kaldi-error.h:#define KALDI_PARANOID_ASSERT(cond) (void)0 base/kaldi-error.h:typedef void (*LogHandler)(const LogMessageEnvelope &envelope, base/kaldi-math.h:void RandGauss2(float *a, float *b, RandomState *state = NULL); base/kaldi-math.h:void RandGauss2(double *a, double *b, RandomState *state = NULL); base/kaldi-math.h:static inline void AssertEqual(float a, float b, base/kaldi-math.h:template void Factorize(I m, std::vector *factors) { base/kaldi-utils.h:void Sleep(float seconds); base/kaldi-utils.h: void operator = (const type&) base/kaldi-utils.h: static inline void Check() { } base/timer.h: void Reset() { base/timer.h: void Reset() { gettimeofday(&this->time_start_, &time_zone_); } ```
kaldi matrix ```c cblas-wrappers.h:inline void cblas_Xcopy(const int N, const float *X, const int incX, float *Y, cblas-wrappers.h:inline void cblas_Xcopy(const int N, const double *X, const int incX, double *Y, cblas-wrappers.h:inline void cblas_Xrot(const int N, float *X, const int incX, float *Y, cblas-wrappers.h:inline void cblas_Xrot(const int N, double *X, const int incX, double *Y, cblas-wrappers.h:inline void cblas_Xaxpy(const int N, const float alpha, const float *X, cblas-wrappers.h:inline void cblas_Xaxpy(const int N, const double alpha, const double *X, cblas-wrappers.h:inline void cblas_Xscal(const int N, const float alpha, float *data, cblas-wrappers.h:inline void cblas_Xscal(const int N, const double alpha, double *data, cblas-wrappers.h:inline void cblas_Xspmv(const float alpha, const int num_rows, const float *Mdata, cblas-wrappers.h:inline void cblas_Xspmv(const double alpha, const int num_rows, const double *Mdata, cblas-wrappers.h:inline void cblas_Xtpmv(MatrixTransposeType trans, const float *Mdata, cblas-wrappers.h:inline void cblas_Xtpmv(MatrixTransposeType trans, const double *Mdata, cblas-wrappers.h:inline void cblas_Xtpsv(MatrixTransposeType trans, const float *Mdata, cblas-wrappers.h:inline void cblas_Xtpsv(MatrixTransposeType trans, const double *Mdata, cblas-wrappers.h:inline void cblas_Xspmv(MatrixIndexT dim, float alpha, const float *Mdata, cblas-wrappers.h:inline void cblas_Xspmv(MatrixIndexT dim, double alpha, const double *Mdata, cblas-wrappers.h:inline void cblas_Xspr2(MatrixIndexT dim, float alpha, const float *Xdata, cblas-wrappers.h:inline void cblas_Xspr2(MatrixIndexT dim, double alpha, const double *Xdata, cblas-wrappers.h:inline void cblas_Xspr(MatrixIndexT dim, float alpha, const float *Xdata, cblas-wrappers.h:inline void cblas_Xspr(MatrixIndexT dim, double alpha, const double *Xdata, cblas-wrappers.h:inline void cblas_Xgemv(MatrixTransposeType trans, MatrixIndexT num_rows, cblas-wrappers.h:inline void cblas_Xgemv(MatrixTransposeType trans, MatrixIndexT num_rows, cblas-wrappers.h:inline void cblas_Xgbmv(MatrixTransposeType trans, MatrixIndexT num_rows, cblas-wrappers.h:inline void cblas_Xgbmv(MatrixTransposeType trans, MatrixIndexT num_rows, cblas-wrappers.h:inline void Xgemv_sparsevec(MatrixTransposeType trans, MatrixIndexT num_rows, cblas-wrappers.h:inline void cblas_Xgemm(const float alpha, cblas-wrappers.h:inline void cblas_Xgemm(const double alpha, cblas-wrappers.h:inline void cblas_Xsymm(const float alpha, cblas-wrappers.h:inline void cblas_Xsymm(const double alpha, cblas-wrappers.h:inline void cblas_Xger(MatrixIndexT num_rows, MatrixIndexT num_cols, float alpha, cblas-wrappers.h:inline void cblas_Xger(MatrixIndexT num_rows, MatrixIndexT num_cols, double alpha, cblas-wrappers.h:inline void cblas_Xsyrk ( cblas-wrappers.h:inline void cblas_Xsyrk( cblas-wrappers.h:inline void cblas_Xsbmv1( cblas-wrappers.h:inline void cblas_Xsbmv1( cblas-wrappers.h:inline void mul_elements( cblas-wrappers.h:inline void mul_elements( cblas-wrappers.h:inline void clapack_Xtptri(KaldiBlasInt *num_rows, float *Mdata, KaldiBlasInt *result) { cblas-wrappers.h:inline void clapack_Xtptri(KaldiBlasInt *num_rows, double *Mdata, KaldiBlasInt *result) { cblas-wrappers.h:inline void clapack_Xgetrf2(KaldiBlasInt *num_rows, KaldiBlasInt *num_cols, cblas-wrappers.h:inline void clapack_Xgetrf2(KaldiBlasInt *num_rows, KaldiBlasInt *num_cols, cblas-wrappers.h:inline void clapack_Xgetri2(KaldiBlasInt *num_rows, float *Mdata, KaldiBlasInt *stride, cblas-wrappers.h:inline void clapack_Xgetri2(KaldiBlasInt *num_rows, double *Mdata, KaldiBlasInt *stride, cblas-wrappers.h:inline void clapack_Xgesvd(char *v, char *u, KaldiBlasInt *num_cols, cblas-wrappers.h:inline void clapack_Xgesvd(char *v, char *u, KaldiBlasInt *num_cols, cblas-wrappers.h:void inline clapack_Xsptri(KaldiBlasInt *num_rows, float *Mdata, cblas-wrappers.h:void inline clapack_Xsptri(KaldiBlasInt *num_rows, double *Mdata, cblas-wrappers.h:void inline clapack_Xsptrf(KaldiBlasInt *num_rows, float *Mdata, cblas-wrappers.h:void inline clapack_Xsptrf(KaldiBlasInt *num_rows, double *Mdata, cblas-wrappers.h:inline void clapack_Xgetrf(MatrixIndexT num_rows, MatrixIndexT num_cols, cblas-wrappers.h:inline void clapack_Xgetrf(MatrixIndexT num_rows, MatrixIndexT num_cols, cblas-wrappers.h:inline void clapack_Xgetri(MatrixIndexT num_rows, float *Mdata, MatrixIndexT stride, cblas-wrappers.h:inline void clapack_Xgetri(MatrixIndexT num_rows, double *Mdata, MatrixIndexT stride, compressed-matrix.h: void *Data() const { return this->data_; } compressed-matrix.h: void CopyFromMat(const MatrixBase &mat, compressed-matrix.h: void CopyToMat(MatrixBase *mat, compressed-matrix.h: void Write(std::ostream &os, bool binary) const; compressed-matrix.h: void Read(std::istream &is, bool binary); compressed-matrix.h: void CopyRowToVec(MatrixIndexT row, VectorBase *v) const; compressed-matrix.h: void CopyColToVec(MatrixIndexT col, VectorBase *v) const; compressed-matrix.h: void CopyToMat(int32 row_offset, compressed-matrix.h: void Swap(CompressedMatrix *other) { std::swap(data_, other->data_); } compressed-matrix.h: void Clear(); compressed-matrix.h: void Scale(float alpha); compressed-matrix.h: static void *AllocateData(int32 num_bytes); compressed-matrix.h: static inline void ComputeGlobalHeader(const MatrixBase &mat, compressed-matrix.h: static void CompressColumn(const GlobalHeader &global_header, compressed-matrix.h: static void ComputeColHeader(const GlobalHeader &global_header, compressed-matrix.h: void *data_; // first GlobalHeader, then PerColHeader (repeated), then jama-eig.h: void GetV(MatrixBase *V_out) { // V is what we call P externally; it's the matrix of jama-eig.h: void GetRealEigenvalues(VectorBase *r_out) { jama-eig.h: void GetImagEigenvalues(VectorBase *i_out) { jama-eig.h: inline static void cdiv(Real xr, Real xi, Real yr, Real yi, Real *cdivr, Real *cdivi) { jama-eig.h: void Hqr2 (); jama-eig.h: void Tred2 (); jama-eig.h: void Tql2 (); jama-eig.h: void Orthes (); jama-eig.h:template void EigenvalueDecomposition::Tred2() { jama-eig.h: // Scale to avoid under/overflow. jama-eig.h:template void EigenvalueDecomposition::Tql2() { jama-eig.h:void EigenvalueDecomposition::Orthes() { jama-eig.h: // Double division avoids possible underflow jama-eig.h:template void EigenvalueDecomposition::Hqr2() { kaldi-gpsr.h: void Register(OptionsItf *opts); kaldi-gpsr.h:inline void GpsrConfig::Register(OptionsItf *opts) { kaldi-matrix-inl.h:void MatrixBase::AddVecVec(const float alpha, const VectorBase &ra, const VectorBase &rb); kaldi-matrix-inl.h:void MatrixBase::AddVecVec(const double alpha, const VectorBase &ra, const VectorBase &rb); kaldi-matrix.h: void SetZero(); kaldi-matrix.h: void Set(Real); kaldi-matrix.h: void SetUnit(); kaldi-matrix.h: void SetRandn(); kaldi-matrix.h: void SetRandUniform(); kaldi-matrix.h: void CopyFromMat(const MatrixBase & M, kaldi-matrix.h: void CopyFromMat(const CompressedMatrix &M); kaldi-matrix.h: void CopyFromSp(const SpMatrix &M); kaldi-matrix.h: void CopyFromTp(const TpMatrix &M, kaldi-matrix.h: void CopyFromMat(const CuMatrixBase &M, kaldi-matrix.h: void CopyRowsFromVec(const VectorBase &v); kaldi-matrix.h: void CopyRowsFromVec(const CuVectorBase &v); kaldi-matrix.h: void CopyRowsFromVec(const VectorBase &v); kaldi-matrix.h: void CopyColsFromVec(const VectorBase &v); kaldi-matrix.h: void CopyColFromVec(const VectorBase &v, const MatrixIndexT col); kaldi-matrix.h: void CopyRowFromVec(const VectorBase &v, const MatrixIndexT row); kaldi-matrix.h: void CopyDiagFromVec(const VectorBase &v); kaldi-matrix.h: void MulElements(const MatrixBase &A); kaldi-matrix.h: void DivElements(const MatrixBase &A); kaldi-matrix.h: void Scale(Real alpha); kaldi-matrix.h: void Max(const MatrixBase &A); kaldi-matrix.h: void Min(const MatrixBase &A); kaldi-matrix.h: void MulColsVec(const VectorBase &scale); kaldi-matrix.h: void MulRowsVec(const VectorBase &scale); kaldi-matrix.h: void MulRowsGroupMat(const MatrixBase &src); kaldi-matrix.h: void Invert(Real *log_det = NULL, Real *det_sign = NULL, kaldi-matrix.h: void InvertDouble(Real *LogDet = NULL, Real *det_sign = NULL, kaldi-matrix.h: void InvertElements(); kaldi-matrix.h: void Transpose(); kaldi-matrix.h: void CopyCols(const MatrixBase &src, kaldi-matrix.h: void CopyRows(const MatrixBase &src, kaldi-matrix.h: void AddCols(const MatrixBase &src, kaldi-matrix.h: void CopyRows(const Real *const *src); kaldi-matrix.h: void CopyToRows(Real *const *dst) const; kaldi-matrix.h: void AddRows(Real alpha, kaldi-matrix.h: void AddRows(Real alpha, const Real *const *src); kaldi-matrix.h: void AddToRows(Real alpha, Real *const *dst) const; kaldi-matrix.h: void AddToRows(Real alpha, kaldi-matrix.h: void ApplyFloor(Real floor_val); kaldi-matrix.h: void ApplyCeiling(Real ceiling_val); kaldi-matrix.h: void ApplyLog(); kaldi-matrix.h: void ApplyExp(); kaldi-matrix.h: void ApplyExpSpecial(); kaldi-matrix.h: void ApplyPow(Real power); kaldi-matrix.h: void ApplyPowAbs(Real power, bool include_sign=false); kaldi-matrix.h: void ApplyHeaviside(); kaldi-matrix.h: void Eig(MatrixBase *P, kaldi-matrix.h: void DestructiveSvd(VectorBase *s, MatrixBase *U, kaldi-matrix.h: void Svd(VectorBase *s, MatrixBase *U, kaldi-matrix.h: void Svd(VectorBase *s) const { Svd(s, NULL, NULL); } kaldi-matrix.h: void TestUninitialized() const; // This function is designed so that if any element kaldi-matrix.h: void Sigmoid(const MatrixBase &src); kaldi-matrix.h: void Heaviside(const MatrixBase &src); kaldi-matrix.h: void SoftHinge(const MatrixBase &src); kaldi-matrix.h: void GroupPnorm(const MatrixBase &src, Real power); kaldi-matrix.h: void GroupPnormDeriv(const MatrixBase &input, const MatrixBase &output, kaldi-matrix.h: void GroupMax(const MatrixBase &src); kaldi-matrix.h: void GroupMaxDeriv(const MatrixBase &input, const MatrixBase &output); kaldi-matrix.h: void Tanh(const MatrixBase &src); kaldi-matrix.h: void DiffSigmoid(const MatrixBase &value, kaldi-matrix.h: void DiffTanh(const MatrixBase &value, kaldi-matrix.h: void SymPosSemiDefEig(VectorBase *s, MatrixBase *P, kaldi-matrix.h: void Add(const Real alpha); kaldi-matrix.h: void AddToDiag(const Real alpha); kaldi-matrix.h: void AddVecVec(const Real alpha, const VectorBase &a, kaldi-matrix.h: void AddVecToRows(const Real alpha, const VectorBase &v); kaldi-matrix.h: void AddVecToCols(const Real alpha, const VectorBase &v); kaldi-matrix.h: void AddMat(const Real alpha, const MatrixBase &M, kaldi-matrix.h: void AddSmat(Real alpha, const SparseMatrix &A, kaldi-matrix.h: void AddSmatMat(Real alpha, const SparseMatrix &A, kaldi-matrix.h: void AddMatSmat(Real alpha, const MatrixBase &A, kaldi-matrix.h: void SymAddMat2(const Real alpha, const MatrixBase &M, kaldi-matrix.h: void AddDiagVecMat(const Real alpha, const VectorBase &v, kaldi-matrix.h: void AddMatDiagVec(const Real alpha, kaldi-matrix.h: void AddMatMatElements(const Real alpha, kaldi-matrix.h: void AddSp(const Real alpha, const SpMatrix &S); kaldi-matrix.h: void AddMatMat(const Real alpha, kaldi-matrix.h: void SetMatMatDivMat(const MatrixBase& A, kaldi-matrix.h: void AddMatSmat(const Real alpha, kaldi-matrix.h: void AddSmatMat(const Real alpha, kaldi-matrix.h: void AddMatMatMat(const Real alpha, kaldi-matrix.h: void AddSpMat(const Real alpha, kaldi-matrix.h: void AddTpMat(const Real alpha, kaldi-matrix.h: void AddMatSp(const Real alpha, kaldi-matrix.h: void AddSpMatSp(const Real alpha, kaldi-matrix.h: void AddMatTp(const Real alpha, kaldi-matrix.h: void AddTpTp(const Real alpha, kaldi-matrix.h: void AddSpSp(const Real alpha, kaldi-matrix.h: void CopyLowerToUpper(); kaldi-matrix.h: void CopyUpperToLower(); kaldi-matrix.h: void OrthogonalizeRows(); kaldi-matrix.h: void Read(std::istream & in, bool binary, bool add = false); kaldi-matrix.h: void Write(std::ostream & out, bool binary) const; kaldi-matrix.h: void LapackGesvd(VectorBase *s, MatrixBase *U, kaldi-matrix.h: void Swap(Matrix *other); kaldi-matrix.h: void Swap(CuMatrix *mat); kaldi-matrix.h: /// Same as above, but need to avoid default copy constructor. kaldi-matrix.h: void Read(std::istream & in, bool binary, bool add = false); kaldi-matrix.h: void RemoveRow(MatrixIndexT i); kaldi-matrix.h: void Transpose(); kaldi-matrix.h: void Resize(const MatrixIndexT r, kaldi-matrix.h: void Destroy(); kaldi-matrix.h: void Init(const MatrixIndexT r, kaldi-matrix.h:inline void AssertEqual(const MatrixBase &A, const MatrixBase &B, kaldi-matrix.h:template void SortSvd(VectorBase *s, MatrixBase *U, kaldi-matrix.h:void CreateEigenvalueMatrix(const VectorBase &real, const VectorBase &imag, kaldi-vector-inl.h:void VectorBase::AddVec(const float alpha, const VectorBase &rv); kaldi-vector-inl.h:void VectorBase::AddVec(const double alpha, kaldi-vector.h: void SetZero(); kaldi-vector.h: void Set(Real f); kaldi-vector.h: void SetRandn(); kaldi-vector.h: void SetRandUniform(); kaldi-vector.h: void CopyFromVec(const VectorBase &v); kaldi-vector.h: void CopyFromPacked(const PackedMatrix &M); kaldi-vector.h: void CopyFromVec(const VectorBase &v); kaldi-vector.h: void CopyFromVec(const CuVectorBase &v); kaldi-vector.h: void ApplyLog(); kaldi-vector.h: void ApplyLogAndCopy(const VectorBase &v); kaldi-vector.h: void ApplyExp(); kaldi-vector.h: void ApplyAbs(); kaldi-vector.h: void ApplyFloor(Real floor_val, MatrixIndexT *floored_count = nullptr); kaldi-vector.h: void ApplyCeiling(Real ceil_val, MatrixIndexT *ceiled_count = nullptr); kaldi-vector.h: void Tanh(const VectorBase &src); kaldi-vector.h: void Sigmoid(const VectorBase &src); kaldi-vector.h: void ApplyPow(Real power); kaldi-vector.h: void ApplyPowAbs(Real power, bool include_sign=false); kaldi-vector.h: void InvertElements(); kaldi-vector.h: void AddVec(const Real alpha, const VectorBase &v); kaldi-vector.h: void AddVec2(const Real alpha, const VectorBase &v); kaldi-vector.h: void AddVec2(const Real alpha, const VectorBase &v); kaldi-vector.h: void AddMatVec(const Real alpha, const MatrixBase &M, kaldi-vector.h: void AddMatSvec(const Real alpha, const MatrixBase &M, kaldi-vector.h: void AddSpVec(const Real alpha, const SpMatrix &M, kaldi-vector.h: void AddTpVec(const Real alpha, const TpMatrix &M, kaldi-vector.h: void ReplaceValue(Real orig, Real changed); kaldi-vector.h: void MulElements(const VectorBase &v); kaldi-vector.h: void MulElements(const VectorBase &v); kaldi-vector.h: void DivElements(const VectorBase &v); kaldi-vector.h: void DivElements(const VectorBase &v); kaldi-vector.h: void Add(Real c); kaldi-vector.h: void AddVecVec(Real alpha, const VectorBase &v, kaldi-vector.h: void AddVecDivVec(Real alpha, const VectorBase &v, kaldi-vector.h: void Scale(Real alpha); kaldi-vector.h: void MulTp(const TpMatrix &M, const MatrixTransposeType trans); kaldi-vector.h: void Solve(const TpMatrix &M, const MatrixTransposeType trans); kaldi-vector.h: void CopyRowsFromMat(const MatrixBase &M); kaldi-vector.h: void CopyRowsFromMat(const MatrixBase &M); kaldi-vector.h: void CopyRowsFromMat(const CuMatrixBase &M); kaldi-vector.h: void CopyColsFromMat(const MatrixBase &M); kaldi-vector.h: void CopyRowFromMat(const MatrixBase &M, MatrixIndexT row); kaldi-vector.h: void CopyRowFromMat(const MatrixBase &M, MatrixIndexT row); kaldi-vector.h: void CopyRowFromSp(const SpMatrix &S, MatrixIndexT row); kaldi-vector.h: void CopyColFromMat(const MatrixBase &M , MatrixIndexT col); kaldi-vector.h: void CopyDiagFromMat(const MatrixBase &M); kaldi-vector.h: void CopyDiagFromPacked(const PackedMatrix &M); kaldi-vector.h: inline void CopyDiagFromSp(const SpMatrix &M) { CopyDiagFromPacked(M); } kaldi-vector.h: inline void CopyDiagFromTp(const TpMatrix &M) { CopyDiagFromPacked(M); } kaldi-vector.h: void AddRowSumMat(Real alpha, const MatrixBase &M, Real beta = 1.0); kaldi-vector.h: void AddColSumMat(Real alpha, const MatrixBase &M, Real beta = 1.0); kaldi-vector.h: void AddDiagMat2(Real alpha, const MatrixBase &M, kaldi-vector.h: void AddDiagMatMat(Real alpha, const MatrixBase &M, MatrixTransposeType transM, kaldi-vector.h: void Read(std::istream & in, bool binary, bool add = false); kaldi-vector.h: void Write(std::ostream &Out, bool binary) const; kaldi-vector.h: void CopyFromPtr(const Real* Data, MatrixIndexT sz); kaldi-vector.h: void Swap(Vector *other); kaldi-vector.h: void Read(std::istream & in, bool binary, bool add = false); kaldi-vector.h: void Resize(MatrixIndexT length, MatrixResizeType resize_type = kSetZero); kaldi-vector.h: void RemoveElement(MatrixIndexT i); kaldi-vector.h: void Init(const MatrixIndexT dim); kaldi-vector.h: void Destroy(); kaldi-vector.h:inline void AssertEqual(VectorBase &a, VectorBase &b, matrix-functions-inl.h:template inline void ComplexMul(const Real &a_re, const Real &a_im, matrix-functions-inl.h:template inline void ComplexAddProduct(const Real &a_re, const Real &a_im, matrix-functions-inl.h:template inline void ComplexImExp(Real x, Real *a_re, Real *a_im) { matrix-functions.h:template void ComplexFft (VectorBase *v, bool forward, Vector *tmp_work = NULL); matrix-functions.h:template void ComplexFt (const VectorBase &in, matrix-functions.h:template void RealFft (VectorBase *v, bool forward); matrix-functions.h:template void RealFftInefficient (VectorBase *v, bool forward); matrix-functions.h:template void ComputeDctMatrix(Matrix *M); matrix-functions.h:template inline void ComplexMul(const Real &a_re, const Real &a_im, matrix-functions.h:template inline void ComplexAddProduct(const Real &a_re, const Real &a_im, matrix-functions.h:template inline void ComplexImExp(Real x, Real *a_re, Real *a_im); matrix-functions.h:void ComputePca(const MatrixBase &X, matrix-functions.h:void AddOuterProductPlusMinus(Real alpha, matrix-functions.h:inline void AssertSameDim(const MatrixBase &mat1, const MatrixBase &mat2) { optimization.h: void DoStep(Real function_value, optimization.h: void DoStep(Real function_value, optimization.h: void Restart(const VectorBase &x, optimization.h: void ComputeNewDirection(Real function_value, optimization.h: void ComputeHifNeeded(const VectorBase &gradient); optimization.h: void StepSizeIteration(Real function_value, optimization.h: void RecordStepLength(Real s); packed-matrix.h: void SetZero(); /// < Set to zero packed-matrix.h: void SetUnit(); /// < Set to unit matrix. packed-matrix.h: void SetRandn(); /// < Set to random values of a normal distribution packed-matrix.h: void Resize(MatrixIndexT nRows, MatrixResizeType resize_type = kSetZero); packed-matrix.h: void AddToDiag(const Real r); // Adds r to diaginal packed-matrix.h: void ScaleDiag(const Real alpha); // Scales diagonal by alpha. packed-matrix.h: void SetDiag(const Real alpha); // Sets diagonal to this value. packed-matrix.h: void CopyFromPacked(const PackedMatrix &orig); packed-matrix.h: void CopyFromVec(const SubVector &orig); packed-matrix.h: // This code is duplicated in child classes to avoid extra levels of calls. packed-matrix.h: // This code is duplicated in child classes to avoid extra levels of calls. packed-matrix.h: void Scale(Real c); packed-matrix.h: void Read(std::istream &in, bool binary, bool add = false); packed-matrix.h: void Write(std::ostream &out, bool binary) const; packed-matrix.h: void Destroy(); packed-matrix.h: void Swap(PackedMatrix *other); packed-matrix.h: void Swap(Matrix *other); packed-matrix.h: void AddPacked(const Real alpha, const PackedMatrix& M); packed-matrix.h: void Init(MatrixIndexT dim); sp-matrix.h: void Swap(SpMatrix *other); sp-matrix.h: inline void Resize(MatrixIndexT nRows, MatrixResizeType resize_type = kSetZero) { sp-matrix.h: void CopyFromSp(const SpMatrix &other) { sp-matrix.h: void CopyFromSp(const SpMatrix &other) { sp-matrix.h: void CopyFromMat(const MatrixBase &orig, sp-matrix.h: void CopyFromMat(const MatrixBase &orig, sp-matrix.h: void Invert(Real *logdet = NULL, Real *det_sign= NULL, sp-matrix.h: void InvertDouble(Real *logdet = NULL, Real *det_sign = NULL, sp-matrix.h: void ApplyPow(Real exponent); sp-matrix.h: void SymPosSemiDefEig(VectorBase *s, MatrixBase *P, sp-matrix.h: void Eig(VectorBase *s, MatrixBase *P = NULL) const; sp-matrix.h: void TopEigs(VectorBase *s, MatrixBase *P, sp-matrix.h: void PrintEigs(const char *name) { sp-matrix.h: void AddSp(const Real alpha, const SpMatrix &Ma) { sp-matrix.h: void AddVec2(const Real alpha, const VectorBase &v); sp-matrix.h: void AddVecVec(const Real alpha, const VectorBase &v, sp-matrix.h: void AddVec2Sp(const Real alpha, const VectorBase &v, sp-matrix.h: void AddDiagVec(const Real alpha, const VectorBase &v); sp-matrix.h: void AddMat2(const Real alpha, const MatrixBase &M, sp-matrix.h: void AddMat2Sp(const Real alpha, const MatrixBase &M, sp-matrix.h: void AddSmat2Sp(const Real alpha, const MatrixBase &M, sp-matrix.h: void AddTp2Sp(const Real alpha, const TpMatrix &T, sp-matrix.h: void AddTp2(const Real alpha, const TpMatrix &T, sp-matrix.h: void AddMat2Vec(const Real alpha, const MatrixBase &M, sp-matrix.h: void Tridiagonalize(MatrixBase *Q); sp-matrix.h: void Qr(MatrixBase *Q); sp-matrix.h: void EigInternal(VectorBase *s, MatrixBase *P, sp-matrix.h:inline void AssertEqual(const SpMatrix &A, sp-matrix.h: void Check() const; sparse-matrix.h: void CopyElementsToVec(VectorBase *vec) const; sparse-matrix.h: void AddToVec(Real alpha, sparse-matrix.h: void CopyFromSvec(const SparseVector &other); sparse-matrix.h: void Swap(SparseVector *other); sparse-matrix.h: void SetRandn(BaseFloat zero_prob); sparse-matrix.h: void Resize(MatrixIndexT dim, MatrixResizeType resize_type = kSetZero); sparse-matrix.h: void Write(std::ostream &os, bool binary) const; sparse-matrix.h: void Read(std::istream &os, bool binary); sparse-matrix.h: void Scale(Real alpha); sparse-matrix.h: void CopyToMat(MatrixBase *other, sparse-matrix.h: void CopyElementsToVec(VectorBase *other) const; sparse-matrix.h: void CopyFromSmat(const SparseMatrix &other, sparse-matrix.h: void AddToMat(BaseFloat alpha, MatrixBase *other, sparse-matrix.h: void Swap(SparseMatrix *other); sparse-matrix.h: void SetRandn(BaseFloat zero_prob); sparse-matrix.h: void Write(std::ostream &os, bool binary) const; sparse-matrix.h: void Read(std::istream &os, bool binary); sparse-matrix.h: void SetRow(int32 r, const SparseVector &vec); sparse-matrix.h: void SelectRows(const std::vector &row_indexes, sparse-matrix.h: void AppendSparseMatrixRows(std::vector > *inputs); sparse-matrix.h: void Resize(MatrixIndexT rows, MatrixIndexT cols, sparse-matrix.h: void Scale(Real alpha); sparse-matrix.h: void Compress(); // If it was a full matrix, compresses, changing Type() to sparse-matrix.h: void Uncompress(); // If it was a compressed matrix, uncompresses, changing sparse-matrix.h: void Write(std::ostream &os, bool binary) const; sparse-matrix.h: void Read(std::istream &is, bool binary); sparse-matrix.h: void SwapSparseMatrix(SparseMatrix *smat); sparse-matrix.h: void SwapCompressedMatrix(CompressedMatrix *cmat); sparse-matrix.h: void GetMatrix(Matrix *mat) const; sparse-matrix.h: void SwapFullMatrix(Matrix *mat); sparse-matrix.h: void CopyToMat(MatrixBase *mat, sparse-matrix.h: void CopyToMat(CuMatrixBase *cu_mat, sparse-matrix.h: void AddToMat(BaseFloat alpha, MatrixBase *mat, sparse-matrix.h: void AddToMat(BaseFloat alpha, CuMatrixBase *cu_mat, sparse-matrix.h: void Scale(BaseFloat alpha); sparse-matrix.h: void Clear(); sparse-matrix.h: void Swap(GeneralMatrix *other); sparse-matrix.h:void AppendGeneralMatrixRows(const std::vector &src, sparse-matrix.h:void FilterSparseMatrixRows(const SparseMatrix &in, sparse-matrix.h:void FilterMatrixRows(const Matrix &in, sparse-matrix.h:void FilterCompressedMatrixRows(const CompressedMatrix &in, sparse-matrix.h:void FilterGeneralMatrixRows(const GeneralMatrix &in, sparse-matrix.h:void ExtractRowRangeWithPadding( srfft.h: void Compute(Real *xr, Real *xi, bool forward) const; srfft.h: void Compute(Real *x, bool forward); srfft.h: void Compute(Real *x, bool forward, std::vector *temp_buffer) const; srfft.h: void ComputeTables(); srfft.h: void ComputeRecursive(Real *xr, Real *xi, Integer logn) const; srfft.h: void BitReversePermute(Real *x, Integer logn) const; srfft.h: void Compute(Real *x, bool forward); srfft.h: void Compute(Real *x, bool forward, std::vector *temp_buffer) const; tp-matrix.h: void Cholesky(const SpMatrix& orig); tp-matrix.h: void Invert(); tp-matrix.h: void InvertDouble() { tp-matrix.h: void Swap(TpMatrix *other); tp-matrix.h: void CopyFromMat(const MatrixBase &M, tp-matrix.h: void CopyFromMat(const CuTpMatrix &other); tp-matrix.h: void CopyFromTp(const TpMatrix &other) { tp-matrix.h: template void CopyFromTp(const TpMatrix &other) { tp-matrix.h: void AddTp(const Real alpha, const TpMatrix &M) { tp-matrix.h: void Resize(MatrixIndexT nRows, MatrixResizeType resize_type = kSetZero) { ```
kaldi cudamatrix ```c cudamatrix/cu-allocator.h: // shouldn't be too critical. The reason it exists is to avoid calling the cudamatrix/cu-allocator.h: void Check() { cudamatrix/cu-allocator.h: void* Malloc(size_t size); cudamatrix/cu-allocator.h: void* MallocPitch(size_t row_bytes, size_t num_rows, size_t *pitch); cudamatrix/cu-allocator.h: void Free(void *ptr); cudamatrix/cu-allocator.h: inline void* MallocLocking(size_t size) { cudamatrix/cu-allocator.h: inline void* MallocPitchLocking(size_t row_bytes, size_t num_rows, size_t *pitch) { cudamatrix/cu-allocator.h: void FreeLocking(void *ptr) { cudamatrix/cu-allocator.h: void PrintMemoryUsage() const; cudamatrix/cu-allocator.h: void FreeSomeCachedMemory(size_t bytes_to_free); cudamatrix/cu-allocator.h: inline void* MallocPitchInternal(size_t row_bytes, size_t num_rows, size_t *pitch); cudamatrix/cu-allocator.h: void *pointer; // the CUDA memory location that we own cudamatrix/cu-allocator.h: CachedMemoryElement(void *pointer, size_t t, size_t pitch): cudamatrix/cu-allocator.h: void Insert(const MemoryRequest &request, cudamatrix/cu-allocator.h: size_t operator() (const void *arg) const noexcept { cudamatrix/cu-allocator.h: unordered_map used_map_; cudamatrix/cu-array-inl.h:void CuArray::Resize(MatrixIndexT dim, MatrixResizeType resize_type) { cudamatrix/cu-array-inl.h:void CuArray::Destroy() { cudamatrix/cu-array-inl.h:void CuArrayBase::CopyFromVec(const std::vector &src) { cudamatrix/cu-array-inl.h:void CuArray::CopyFromVec(const std::vector &src) { cudamatrix/cu-array-inl.h:void CuArray::CopyFromArray(const CuArrayBase &src) { cudamatrix/cu-array-inl.h:void CuArrayBase::CopyFromArray(const CuArrayBase &src) { cudamatrix/cu-array-inl.h:void CuArrayBase::CopyToVec(std::vector *dst) const { cudamatrix/cu-array-inl.h:void CuArrayBase::CopyToHost(T *dst) const { cudamatrix/cu-array-inl.h:void CuArrayBase::SetZero() { cudamatrix/cu-array-inl.h: memset(static_cast(this->data_), 0, this->dim_ * sizeof(T)); cudamatrix/cu-array-inl.h:void CuArrayBase::Set(const T &value) { cudamatrix/cu-array-inl.h:void CuArrayBase::Set(const int32 &value); cudamatrix/cu-array-inl.h:void CuArrayBase::Sequence(const T base) { cudamatrix/cu-array-inl.h:void CuArrayBase::Sequence(const int32 base); cudamatrix/cu-array-inl.h:void CuArrayBase::Add(const T &value) { cudamatrix/cu-array-inl.h:void CuArrayBase::Add(const int32 &value); cudamatrix/cu-array-inl.h:void CuArray::Read(std::istream& in, bool binary) { cudamatrix/cu-array-inl.h:void CuArray::Write(std::ostream& out, bool binary) const { cudamatrix/cu-array-inl.h:void CuArray::Swap(CuArray *other) { cudamatrix/cu-array.h: void SetZero(); cudamatrix/cu-array.h: void CopyFromArray(const CuArrayBase &src); cudamatrix/cu-array.h: void CopyFromVec(const std::vector &src); cudamatrix/cu-array.h: void CopyToVec(std::vector *dst) const; cudamatrix/cu-array.h: void CopyToHost(T *dst) const; cudamatrix/cu-array.h: void Set(const T &value); cudamatrix/cu-array.h: void Sequence(const T base); cudamatrix/cu-array.h: void Add(const T &value); cudamatrix/cu-array.h: void Resize(MatrixIndexT dim, MatrixResizeType resize_type = kSetZero); cudamatrix/cu-array.h: void Destroy(); cudamatrix/cu-array.h: void CopyFromVec(const std::vector &src); cudamatrix/cu-array.h: void CopyFromArray(const CuArrayBase &src); cudamatrix/cu-array.h: void Swap(CuArray *other); cudamatrix/cu-array.h: void Read(std::istream &is, bool binary); cudamatrix/cu-array.h: void Write(std::ostream &is, bool binary) const; cudamatrix/cu-block-matrix.h: void Write(std::ostream &os, bool binary) const; cudamatrix/cu-block-matrix.h: void Read(std::istream &is, bool binary); cudamatrix/cu-block-matrix.h: void AddMatMat(BaseFloat alpha, cudamatrix/cu-block-matrix.h: void CopyFromMat(const CuMatrix &M); cudamatrix/cu-block-matrix.h: void NormalizeColumns(); cudamatrix/cu-block-matrix.h: void Swap(CuBlockMatrix *other); cudamatrix/cu-block-matrix.h: void FreeCudaData(); cudamatrix/cu-block-matrix.h: void SetCudaData(); cudamatrix/cu-block-matrix.h: void Destroy(); cudamatrix/cu-common.h:void GetBlockSizesForSimpleMatrixOperation(int32 num_rows, cudamatrix/cu-compressed-matrix.h: virtual void CopyFromMat(const CuMatrixBase &mat) = 0; cudamatrix/cu-compressed-matrix.h: virtual void CopyToMat(CuMatrixBase *mat) const = 0; cudamatrix/cu-compressed-matrix.h: /// allows the compression code to avoid the bounds check. cudamatrix/cu-compressed-matrix.h: virtual void CopyFromMat(const CuMatrixBase &mat); cudamatrix/cu-compressed-matrix.h: virtual void CopyToMat(CuMatrixBase *mat) const; cudamatrix/cu-compressed-matrix.h: void Destroy(); cudamatrix/cu-device.h: // previous allocations to avoid the very large overhead that CUDA's cudamatrix/cu-device.h: inline void* Malloc(size_t size) { cudamatrix/cu-device.h: inline void* MallocPitch(size_t row_bytes, size_t num_rows, size_t *pitch) { cudamatrix/cu-device.h: inline void Free(void *ptr) { cudamatrix/cu-device.h: void SelectGpuId(std::string use_gpu); cudamatrix/cu-device.h: void AccuProfile(const char *function_name, const CuTimer &timer); cudamatrix/cu-device.h: void PrintProfile(); cudamatrix/cu-device.h: void PrintMemoryUsage() const; cudamatrix/cu-device.h: inline void AllowMultithreading() { multi_threaded_ = true; } cudamatrix/cu-device.h: void ResetProfile() { cudamatrix/cu-device.h: void DeviceGetName(char* name, int32 len, int32 dev); cudamatrix/cu-device.h: void CheckGpuHealth(); cudamatrix/cu-device.h: void FinalizeActiveGpu(); cudamatrix/cu-device.h:// sets the time if the verbose level is >= 1. This helps avoid cudamatrix/cu-kernels-ansi.h:void cudaD_add_col_sum_mat(int Gr, int Bl, double* result, const double* mat, cudamatrix/cu-kernels-ansi.h:void cudaF_add_col_sum_mat(int Gr, int Bl, float* result, const float* mat, cudamatrix/cu-kernels-ansi.h:void cudaD_add_cols(dim3 Gr, dim3 Bl, double* dst, const double* src, cudamatrix/cu-kernels-ansi.h:void cudaF_add_cols(dim3 Gr, dim3 Bl, float* dst, const float* src, cudamatrix/cu-kernels-ansi.h:void cudaD_add_diag_mat_mat_MN(dim3 Gr, dim3 Bl, const double alpha, cudamatrix/cu-kernels-ansi.h:void cudaF_add_diag_mat_mat_MN(dim3 Gr, dim3 Bl, const float alpha, cudamatrix/cu-kernels-ansi.h:void cudaD_add_diag_mat_mat_MNT(int Gr, int Bl, const double alpha, cudamatrix/cu-kernels-ansi.h:void cudaF_add_diag_mat_mat_MNT(int Gr, int Bl, const float alpha, cudamatrix/cu-kernels-ansi.h:void cudaD_add_diag_mat_mat_MTN(dim3 Gr, dim3 Bl, const double alpha, cudamatrix/cu-kernels-ansi.h:void cudaF_add_diag_mat_mat_MTN(dim3 Gr, dim3 Bl, const float alpha, cudamatrix/cu-kernels-ansi.h:void cudaD_add_diag_packed(int Gr, int Bl, double* mat, double value, int dim); cudamatrix/cu-kernels-ansi.h:void cudaF_add_diag_packed(int Gr, int Bl, float* mat, float value, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_add_diag_vec_mat(dim3 Gr, dim3 Bl, double alpha, double *mat, cudamatrix/cu-kernels-ansi.h:void cudaF_add_diag_vec_mat(dim3 Gr, dim3 Bl, float alpha, float *mat, cudamatrix/cu-kernels-ansi.h:void cudaD_add(dim3 Gr, dim3 Bl, double *mat, double value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_add(dim3 Gr, dim3 Bl, float *mat, float value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_add_mat_blockmat(dim3 Gr, dim3 Bl, double *data, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_add_mat_blockmat(dim3 Gr, dim3 Bl, float *data, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_add_mat_blocks(dim3 Gr, dim3 Bl, double alpha, const double *src, cudamatrix/cu-kernels-ansi.h:void cudaF_add_mat_blocks(dim3 Gr, dim3 Bl, float alpha, const float *src, cudamatrix/cu-kernels-ansi.h:void cudaD_add_mat_repeated(dim3 Gr, dim3 Bl, double alpha, const double *src, cudamatrix/cu-kernels-ansi.h:void cudaF_add_mat_repeated(dim3 Gr, dim3 Bl, float alpha, const float *src, cudamatrix/cu-kernels-ansi.h:void cudaD_add_mat_diag_vec(dim3 Gr, dim3 Bl, double alpha, double *mat, cudamatrix/cu-kernels-ansi.h:void cudaF_add_mat_diag_vec(dim3 Gr, dim3 Bl, float alpha, float *mat, cudamatrix/cu-kernels-ansi.h:void cudaD_add_mat(dim3 Gr, dim3 Bl, double alpha, const double *src, cudamatrix/cu-kernels-ansi.h:void cudaF_add_mat(dim3 Gr, dim3 Bl, float alpha, const float *src, float *dst, cudamatrix/cu-kernels-ansi.h:void cudaD_add_mat_mat_elements(dim3 Gr, dim3 Bl, double *data, cudamatrix/cu-kernels-ansi.h:void cudaF_add_mat_mat_elements(dim3 Gr, dim3 Bl, float *data, cudamatrix/cu-kernels-ansi.h:void cudaD_add_row_ranges(dim3 Gr, dim3 Bl, double *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaF_add_row_ranges(dim3 Gr, dim3 Bl, float *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaD_add_rows(dim3 Gr, dim3 Bl, double alpha, double* dst, cudamatrix/cu-kernels-ansi.h:void cudaF_add_rows(dim3 Gr, dim3 Bl, float alpha, float* dst, const float* src, cudamatrix/cu-kernels-ansi.h:void cudaD_mul_rows(dim3 Gr, dim3 Bl, double* dst, cudamatrix/cu-kernels-ansi.h:void cudaF_mul_rows(dim3 Gr, dim3 Bl, float* dst, const float* src, cudamatrix/cu-kernels-ansi.h:void cudaD_add_rows_direct(dim3 Gr, dim3 Bl, double alpha, double* dst, cudamatrix/cu-kernels-ansi.h:void cudaF_add_rows_direct(dim3 Gr, dim3 Bl, float alpha, float* dst, cudamatrix/cu-kernels-ansi.h:void cudaD_add_smat(dim3 Gr, dim3 Bl, double* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cudaF_add_smat(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cudaD_add_smat_trans(dim3 Gr, dim3 Bl, double* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cudaF_add_smat_trans(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cudaD_add_to_rows_direct(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels-ansi.h:void cudaF_add_to_rows_direct(dim3 Gr, dim3 Bl, float alpha, float* const * dst, cudamatrix/cu-kernels-ansi.h:void cudaD_add_to_rows(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels-ansi.h:void cudaF_add_to_rows(dim3 Gr, dim3 Bl, float alpha, cudamatrix/cu-kernels-ansi.h:void cudaD_add_vec2(dim3 Gr, dim3 Bl, double *mat, const double *vec, cudamatrix/cu-kernels-ansi.h:void cudaF_add_vec2(dim3 Gr, dim3 Bl, float* mat, const float* vec, cudamatrix/cu-kernels-ansi.h:void cudaD_add_vec_to_cols(dim3 Gr, dim3 Bl, double alpha, const double *col, cudamatrix/cu-kernels-ansi.h:void cudaF_add_vec_to_cols(dim3 Gr, dim3 Bl, float alpha, const float *col, cudamatrix/cu-kernels-ansi.h:void cudaD_add_vec_to_rows(dim3 Gr, dim3 Bl, double alpha, const double *row, cudamatrix/cu-kernels-ansi.h:void cudaF_add_vec_to_rows(dim3 Gr, dim3 Bl, float alpha, const float *row, cudamatrix/cu-kernels-ansi.h:void cudaD_add_vec_vec(int Gr, int Bl, double alpha, double* v, const double* x, cudamatrix/cu-kernels-ansi.h:void cudaF_add_vec_vec(int Gr, int Bl, float alpha, float* v, const float* x, cudamatrix/cu-kernels-ansi.h:void cudaD_apply_ceiling(dim3 Gr, dim3 Bl, double* mat, double ceiling_val, cudamatrix/cu-kernels-ansi.h:void cudaF_apply_ceiling(dim3 Gr, dim3 Bl, float* mat, float ceiling_val, cudamatrix/cu-kernels-ansi.h:void cudaD_apply_exp(dim3 Gr, dim3 Bl, double* mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_apply_exp(dim3 Gr, dim3 Bl, float* mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_apply_exp_limited(dim3 Gr, dim3 Bl, double* mat, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_apply_exp_limited(dim3 Gr, dim3 Bl, float* mat, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_apply_exp_special(dim3 Gr, dim3 Bl, double* out, MatrixDim out_dim, cudamatrix/cu-kernels-ansi.h:void cudaF_apply_exp_special(dim3 Gr, dim3 Bl, float* out, MatrixDim out_dim, cudamatrix/cu-kernels-ansi.h:void cudaD_apply_floor(dim3 Gr, dim3 Bl, double* mat, double floor_val, cudamatrix/cu-kernels-ansi.h:void cudaF_apply_floor(dim3 Gr, dim3 Bl, float* mat, float floor_val, cudamatrix/cu-kernels-ansi.h:void cudaD_apply_heaviside(dim3 Gr, dim3 Bl, double* mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_apply_heaviside(dim3 Gr, dim3 Bl, float* mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_apply_log(dim3 Gr, dim3 Bl, double *mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_apply_log(dim3 Gr, dim3 Bl, float *mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_apply_pow_abs(dim3 Gr, dim3 Bl, double* mat, double power, cudamatrix/cu-kernels-ansi.h:void cudaF_apply_pow_abs(dim3 Gr, dim3 Bl, float* mat, float power, cudamatrix/cu-kernels-ansi.h:void cudaD_apply_pow(dim3 Gr, dim3 Bl, double* mat, double power, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_apply_pow(dim3 Gr, dim3 Bl, float* mat, float power, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_block_add_mat_mat(dim3 Gr, dim3 Bl, CuBlockMatrixData *B_cu_data, cudamatrix/cu-kernels-ansi.h:void cudaF_block_add_mat_mat(dim3 Gr, dim3 Bl, CuBlockMatrixData *B_cu_data, cudamatrix/cu-kernels-ansi.h:void cudaD_calc_group_max_deriv(dim3 Gr, dim3 Bl, double *y, const double *x1, cudamatrix/cu-kernels-ansi.h:void cudaF_calc_group_max_deriv(dim3 Gr, dim3 Bl, float *y, const float *x1, cudamatrix/cu-kernels-ansi.h:void cudaD_comp_obj_deriv(dim3 Gr, dim3 Bl, MatrixElement* x, int s, cudamatrix/cu-kernels-ansi.h:void cudaF_comp_obj_deriv(dim3 Gr, dim3 Bl, MatrixElement* x, int s, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_col_from_mat_df(int Gr, int Bl, double* v, int col, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_col_from_mat_df(int Gr, int Bl, double* v, int col, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_col_from_mat_fd(int Gr, int Bl, float* v, int col, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_col_from_mat_fd(int Gr, int Bl, float* v, int col, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_cols(dim3 Gr, dim3 Bl, double* dst, const double* src, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_cols(dim3 Gr, dim3 Bl, float* dst, const float* src, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_cols_from_vec(dim3 Gr, dim3 Bl, double *mat_out, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_cols_from_vec(dim3 Gr, dim3 Bl, float *mat_out, MatrixDim d_out, cudamatrix/cu-kernels-ansi.h:void cudaD_copy(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_copy(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_dd(dim3 Gr, dim3 Bl, double *mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_dd_trans(dim3 Gr, dim3 Bl, double *mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_df(dim3 Gr, dim3 Bl, double* mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_df_trans(dim3 Gr, dim3 Bl, double* mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_fd(dim3 Gr, dim3 Bl, float *mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_fd_trans(dim3 Gr, dim3 Bl, float *mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_ff(dim3 Gr, dim3 Bl, float* mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_mat_ff_trans(dim3 Gr, dim3 Bl, float* mat_out, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_dd(dim3 Gr, dim3 Bl, double* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_dd_trans(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_df(dim3 Gr, dim3 Bl, double* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_df_trans(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_fd(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_fd_trans(dim3 Gr, dim3 Bl, float* mat, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_ff(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cuda_copy_from_smat_ff_trans(dim3 Gr, dim3 Bl, float* mat, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_from_sp(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_from_sp(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_from_tp(dim3 Gr, dim3 Bl, double* A, const double* B, cudamatrix/cu-kernels-ansi.h:void cudaDF_copy_from_tp(dim3 Gr, dim3 Bl, double* A, const float* B, cudamatrix/cu-kernels-ansi.h:void cudaFD_copy_from_tp(dim3 Gr, dim3 Bl, float* A, const double* B, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_from_tp(dim3 Gr, dim3 Bl, float* A, const float* B, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_from_tp_trans(dim3 Gr, dim3 Bl, double* A, const double* B, cudamatrix/cu-kernels-ansi.h:void cudaDF_copy_from_tp_trans(dim3 Gr, dim3 Bl, double* A, const float* B, cudamatrix/cu-kernels-ansi.h:void cudaFD_copy_from_tp_trans(dim3 Gr, dim3 Bl, float* A, const double* B, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_from_tp_trans(dim3 Gr, dim3 Bl, float* A, const float* B, cudamatrix/cu-kernels-ansi.h:void cublas_copy_kaldi_df(int Gr, int Bl, int n, const double* x, int incx, cudamatrix/cu-kernels-ansi.h:void cublas_copy_kaldi_fd(int Gr, int Bl, int n, const float* x, int incx, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_low_upp(dim3 Gr, dim3 Bl, double* A, MatrixDim dimA); cudamatrix/cu-kernels-ansi.h:void cudaF_copy_low_upp(dim3 Gr, dim3 Bl, float* A, MatrixDim dimA); cudamatrix/cu-kernels-ansi.h:void cudaD_copy_rows(dim3 Gr, dim3 Bl, double* dst, const double* src, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_rows(dim3 Gr, dim3 Bl, float* dst, const float* src, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_rows_direct(dim3 Gr, dim3 Bl, double* dst, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_rows_direct(dim3 Gr, dim3 Bl, float* dst, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_rows_from_vec(dim3 Gr, dim3 Bl, double *mat_out, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_rows_from_vec(dim3 Gr, dim3 Bl, float *mat_out, MatrixDim d_out, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_to_rows_direct(dim3 Gr, dim3 Bl, double* const * dst, cudamatrix/cu-kernels-ansi.h:void cudaF_copy_to_rows_direct(dim3 Gr, dim3 Bl, float* const * dst, cudamatrix/cu-kernels-ansi.h:void cudaD_copy_upp_low(dim3 Gr, dim3 Bl, double* A, MatrixDim dimB); cudamatrix/cu-kernels-ansi.h:void cudaF_copy_upp_low(dim3 Gr, dim3 Bl, float* A, MatrixDim dimA); cudamatrix/cu-kernels-ansi.h:void cudaD_diff_group_pnorm(dim3 Gr, dim3 Bl, double *id, const double *iv, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_group_pnorm(dim3 Gr, dim3 Bl, float *id, const float *iv, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_log_softmax(dim3 Gr, dim3 Bl, const MatrixDim in_deriv_dim, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_log_softmax(dim3 Gr, dim3 Bl, const MatrixDim in_deriv_dim, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_lstm_nonlinearity(dim3 Gr, dim3 Bl, const int cell_dim, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_lstm_nonlinearity(dim3 Gr, dim3 Bl, const int cell_dim, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_normalize_per_row(size_t Gr, size_t Bl, double *id, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_normalize_per_row(size_t Gr, size_t Bl, float *id, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_parametric_relu(dim3 Gr, dim3 Bl, double *eout, const double *e, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_parametric_relu(dim3 Gr, dim3 Bl, float *eout, const float *e, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_sigmoid(dim3 Gr, dim3 Bl, double *eout, const double *e, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_sigmoid(dim3 Gr, dim3 Bl, float *eout, const float *e, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_softmax(dim3 Gr, dim3 Bl, double* x, const MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_softmax(dim3 Gr, dim3 Bl, float* x, const MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_tanh(dim3 Gr, dim3 Bl, double *eout, const double *e, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_tanh(dim3 Gr, dim3 Bl, float *eout, const float *e, cudamatrix/cu-kernels-ansi.h:void cudaD_ensure_nonzero(dim3 Gr, dim3 Bl, const double *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_ensure_nonzero(dim3 Gr, dim3 Bl, const float *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_diff_xent(dim3 Gr, dim3 Bl, const int32_cuda *vec_tgt, cudamatrix/cu-kernels-ansi.h:void cudaF_diff_xent(dim3 Gr, dim3 Bl, const int32_cuda *vec_tgt, cudamatrix/cu-kernels-ansi.h:void cudaD_div_elements(dim3 Gr, dim3 Bl, double *mat, const double *A, cudamatrix/cu-kernels-ansi.h:void cudaF_div_elements(dim3 Gr, dim3 Bl, float *mat, const float *A, cudamatrix/cu-kernels-ansi.h:void cudaD_div_rows_vec(dim3 Gr, dim3 Bl, double *mat, const double *vec_div, cudamatrix/cu-kernels-ansi.h:void cudaF_div_rows_vec(dim3 Gr, dim3 Bl, float *mat, const float *vec_div, cudamatrix/cu-kernels-ansi.h:void cudaD_equal_element_mask(dim3 Gr, dim3 Bl, const double *mat1, cudamatrix/cu-kernels-ansi.h:void cudaF_equal_element_mask(dim3 Gr, dim3 Bl, const float *mat1, cudamatrix/cu-kernels-ansi.h:void cudaD_find_row_max_id(dim3 Gr, dim3 Bl, const double *mat, double *vec_val, cudamatrix/cu-kernels-ansi.h:void cudaF_find_row_max_id(dim3 Gr, dim3 Bl, const float *mat, float *vec_val, cudamatrix/cu-kernels-ansi.h:void cudaD_group_max(dim3 Gr, dim3 Bl, double *y, const double *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_group_max(dim3 Gr, dim3 Bl, float *y, const float *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_group_pnorm(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_group_pnorm(dim3 Gr, dim3 Bl, float *y, const float *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_group_spec_pnorm(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_group_spec_pnorm(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cudaD_heaviside(dim3 Gr, dim3 Bl, double *y, const double *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_heaviside(dim3 Gr, dim3 Bl, float *y, const float *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cuda_int32_add(dim3 Gr, dim3 Bl, int32_cuda *mat, int32_cuda value, cudamatrix/cu-kernels-ansi.h:void cuda_int32_set_const(dim3 Gr, dim3 Bl, int32_cuda *mat, int32_cuda value, cudamatrix/cu-kernels-ansi.h:void cuda_int32_sequence(dim3 Gr, dim3 Bl, int32_cuda* data, int length, cudamatrix/cu-kernels-ansi.h:void cudaD_invert_elements(dim3 Gr, dim3 Bl, double *data, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_invert_elements(dim3 Gr, dim3 Bl, float *data, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_log_softmax_reduce(size_t Gr, size_t Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_log_softmax_reduce(size_t Gr, size_t Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cudaD_lstm_nonlinearity(dim3 Gr, dim3 Bl, const double* in, cudamatrix/cu-kernels-ansi.h:void cudaF_lstm_nonlinearity(dim3 Gr, dim3 Bl, const float* in, cudamatrix/cu-kernels-ansi.h:void cudaD_matrix_add_elements(dim3 Gr, dim3 Bl, double *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaF_matrix_add_elements(dim3 Gr, dim3 Bl, float *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaD_matrix_add_indexed_values(dim3 Gr, dim3 Bl, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaF_matrix_add_indexed_values(dim3 Gr, dim3 Bl, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaD_matrix_add_to_elements(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels-ansi.h:void cudaF_matrix_add_to_elements(dim3 Gr, dim3 Bl, float alpha, cudamatrix/cu-kernels-ansi.h:void cudaD_matrix_lookup(dim3 Gr, dim3 Bl, const double *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaF_matrix_lookup(dim3 Gr, dim3 Bl, const float *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaD_vector_copy_elements(dim3 Gr, dim3 Bl, double *data, int dim, cudamatrix/cu-kernels-ansi.h:void cudaF_vector_copy_elements(dim3 Gr, dim3 Bl, float *data, int dim, cudamatrix/cu-kernels-ansi.h:void cudaD_max(dim3 Gr, dim3 Bl, double *mat, const double *A, MatrixDim dst_d, cudamatrix/cu-kernels-ansi.h:void cudaF_max(dim3 Gr, dim3 Bl, float *mat, const float *A, MatrixDim dst_d, cudamatrix/cu-kernels-ansi.h:void cudaD_max_mat_cols(int Gr, int Bl, double* result, const double* mat, cudamatrix/cu-kernels-ansi.h:void cudaF_max_mat_cols(int Gr, int Bl, float* result, const float* mat, cudamatrix/cu-kernels-ansi.h:void cudaD_min(dim3 Gr, dim3 Bl, double *mat, const double *other, cudamatrix/cu-kernels-ansi.h:void cudaF_min(dim3 Gr, dim3 Bl, float *mat, const float *other, cudamatrix/cu-kernels-ansi.h:void cudaD_min_mat_cols(int Gr, int Bl, double* result, const double* mat, cudamatrix/cu-kernels-ansi.h:void cudaF_min_mat_cols(int Gr, int Bl, float* result, const float* mat, cudamatrix/cu-kernels-ansi.h:void cudaD_mul_cols_vec(dim3 Gr, dim3 Bl, double *mat, const double *scale, cudamatrix/cu-kernels-ansi.h:void cudaF_mul_cols_vec(dim3 Gr, dim3 Bl, float *mat, const float *scale, cudamatrix/cu-kernels-ansi.h:void cudaD_mul_elements(dim3 Gr, dim3 Bl, double *mat, const double *A, cudamatrix/cu-kernels-ansi.h:void cudaF_mul_elements(dim3 Gr, dim3 Bl, float *mat, const float *A, cudamatrix/cu-kernels-ansi.h:void cudaD_mul_rows_group_mat(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_mul_rows_group_mat(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cudaD_mul_rows_vec(dim3 Gr, dim3 Bl, double *mat, const double *scale, cudamatrix/cu-kernels-ansi.h:void cudaF_mul_rows_vec(dim3 Gr, dim3 Bl, float *mat, const float *scale, cudamatrix/cu-kernels-ansi.h:void cudaD_normalize_per_row(size_t Gr, size_t Bl, double *y, int y_stride, cudamatrix/cu-kernels-ansi.h:void cudaF_normalize_per_row(size_t Gr, size_t Bl, float *y, int y_stride, cudamatrix/cu-kernels-ansi.h:void cudaD_one(int Gr, int Bl, double* x, int dim); cudamatrix/cu-kernels-ansi.h:void cudaF_one(int Gr, int Bl, float* x, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_parametric_relu(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_parametric_relu(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cudaD_randomize(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_randomize(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cudaD_regularize_l1(dim3 Gr, dim3 Bl, double *wei, double *grad, double l1, cudamatrix/cu-kernels-ansi.h:void cudaF_regularize_l1(dim3 Gr, dim3 Bl, float *wei, float *grad, float l1, cudamatrix/cu-kernels-ansi.h:void cudaD_replace_value(int Gr, int Bl, double *v, int dim, double orig, cudamatrix/cu-kernels-ansi.h:void cudaF_replace_value(int Gr, int Bl, float *v, int dim, float orig, cudamatrix/cu-kernels-ansi.h:void cudaD_scale_diag_packed(int Gr, int Bl, double* mat, double value, cudamatrix/cu-kernels-ansi.h:void cudaF_scale_diag_packed(int Gr, int Bl, float* mat, float value, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_scale(dim3 Gr, dim3 Bl, double *mat, double value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_scale(dim3 Gr, dim3 Bl, float *mat, float value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_select_rows(dim3 Gr, dim3 Bl, const int* out_row_ptr, cudamatrix/cu-kernels-ansi.h:void cudaF_select_rows(dim3 Gr, dim3 Bl, const int* out_row_ptr, cudamatrix/cu-kernels-ansi.h:void cudaD_set_bias_params(int Gr, int Bl, double* v, const double* a, cudamatrix/cu-kernels-ansi.h:void cudaF_set_bias_params(int Gr, int Bl, float* v, const float* a, cudamatrix/cu-kernels-ansi.h:void cudaD_set_const(dim3 Gr, dim3 Bl, double *mat, double value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_set_const(dim3 Gr, dim3 Bl, float *mat, float value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_set_diag(int Gr, int Bl, double* mat, double value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_set_diag(int Gr, int Bl, float* mat, float value, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_set_diag_packed(int Gr, int Bl, double* mat, double value, int dim); cudamatrix/cu-kernels-ansi.h:void cudaF_set_diag_packed(int Gr, int Bl, float* mat, float value, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_set_mat_mat_div_mat(dim3 Gr, dim3 Bl, const double *A, cudamatrix/cu-kernels-ansi.h:void cudaF_set_mat_mat_div_mat(dim3 Gr, dim3 Bl, const float *A, const float *B, cudamatrix/cu-kernels-ansi.h:void cudaD_set_zero_above_diag(dim3 Gr, dim3 Bl, double* mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaF_set_zero_above_diag(dim3 Gr, dim3 Bl, float* mat, MatrixDim d); cudamatrix/cu-kernels-ansi.h:void cudaD_sigmoid(dim3 Gr, dim3 Bl, double *y, const double *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_sigmoid(dim3 Gr, dim3 Bl, float *y, const float *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_soft_hinge(dim3 Gr, dim3 Bl, double *y, const double *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_soft_hinge(dim3 Gr, dim3 Bl, float *y, const float *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_softmax_reduce(size_t Gr, size_t Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_softmax_reduce(size_t Gr, size_t Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cudaD_splice(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels-ansi.h:void cudaF_splice(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels-ansi.h:void cudaD_sum_column_ranges(dim3 Gr, dim3 Bl, double *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaF_sum_column_ranges(dim3 Gr, dim3 Bl, float *data, MatrixDim dim, cudamatrix/cu-kernels-ansi.h:void cudaD_sum_mat_cols(int Gr, int Bl, double* result, const double* mat, cudamatrix/cu-kernels-ansi.h:void cudaF_sum_mat_cols(int Gr, int Bl, float* result, const float* mat, cudamatrix/cu-kernels-ansi.h:void cudaD_sy_add_tr2(dim3 Gr, dim3 Bl, double alpha, double beta, cudamatrix/cu-kernels-ansi.h:void cudaF_sy_add_tr2(dim3 Gr, dim3 Bl, float alpha, float beta, const float* T, cudamatrix/cu-kernels-ansi.h:void cudaD_take_lower(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels-ansi.h:void cudaF_take_lower(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels-ansi.h:void cudaD_take_mean(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels-ansi.h:void cudaF_take_mean(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels-ansi.h:void cudaD_take_upper(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels-ansi.h:void cudaF_take_upper(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels-ansi.h:void cudaD_tanh(dim3 Gr, dim3 Bl, double *y, const double *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaF_tanh(dim3 Gr, dim3 Bl, float *y, const float *x, MatrixDim d, cudamatrix/cu-kernels-ansi.h:void cudaD_trace(int Gr, int Bl, double* mat, double* value, int dim); cudamatrix/cu-kernels-ansi.h:void cudaF_trace(int Gr, int Bl, float* mat, float* value, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_trace_mat_mat(dim3 Gr, dim3 Bl, const double* A, const double* B, cudamatrix/cu-kernels-ansi.h:void cudaF_trace_mat_mat(dim3 Gr, dim3 Bl, const float* A, const float* B, cudamatrix/cu-kernels-ansi.h:void cudaD_trace_mat_mat_trans(dim3 Gr, dim3 Bl, const double* A, cudamatrix/cu-kernels-ansi.h:void cudaF_trace_mat_mat_trans(dim3 Gr, dim3 Bl, const float* A, const float* B, cudamatrix/cu-kernels-ansi.h:void cudaD_trace_mat_smat(dim3 Gr, dim3 Bl, const double* mat, cudamatrix/cu-kernels-ansi.h:void cudaF_trace_mat_smat(dim3 Gr, dim3 Bl, const float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels-ansi.h:void cudaD_trace_mat_smat_trans(dim3 Gr, dim3 Bl, const double* mat, cudamatrix/cu-kernels-ansi.h:void cudaF_trace_mat_smat_trans(dim3 Gr, dim3 Bl, const float* mat, cudamatrix/cu-kernels-ansi.h:void cudaD_vec_apply_ceiling(int Gr, int Bl, double* v, double ceiling_val, cudamatrix/cu-kernels-ansi.h:void cudaF_vec_apply_ceiling(int Gr, int Bl, float* v, float ceiling_val, cudamatrix/cu-kernels-ansi.h:void cudaD_vec_apply_exp(int Gr, int Bl, double* v, int dim); cudamatrix/cu-kernels-ansi.h:void cudaF_vec_apply_exp(int Gr, int Bl, float* v, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_vec_apply_floor(int Gr, int Bl, double* v, double floor_val, cudamatrix/cu-kernels-ansi.h:void cudaF_vec_apply_floor(int Gr, int Bl, float* v, float floor_val, cudamatrix/cu-kernels-ansi.h:void cudaD_vec_apply_log(int Gr, int Bl, double* v, double* flag, int dim); cudamatrix/cu-kernels-ansi.h:void cudaF_vec_apply_log(int Gr, int Bl, float* v, float* flag, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_vec_copy_diag_from_packed(int Gr, int Bl, double *dst, cudamatrix/cu-kernels-ansi.h:void cudaF_vec_copy_diag_from_packed(int Gr, int Bl, float *dst, cudamatrix/cu-kernels-ansi.h:void cudaD_vec_max(int Gr, int Bl, const double* v, double* value, int dim, cudamatrix/cu-kernels-ansi.h:void cudaF_vec_max(int Gr, int Bl, const float* v, float* value, int dim, cudamatrix/cu-kernels-ansi.h:void cudaD_vec_min(int Gr, int Bl, const double* v, double* value, int dim, cudamatrix/cu-kernels-ansi.h:void cudaF_vec_min(int Gr, int Bl, const float* v, float* value, int dim, cudamatrix/cu-kernels-ansi.h:void cudaD_vec_mul_elements(int Gr, int Bl, double* v, const double* a, cudamatrix/cu-kernels-ansi.h:void cudaF_vec_mul_elements(int Gr, int Bl, float* v, const float* a, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_vec_soft_max(int Gr, int Bl, double* v, int dim); cudamatrix/cu-kernels-ansi.h:void cudaF_vec_soft_max(int Gr, int Bl, float* v, int dim); cudamatrix/cu-kernels-ansi.h:void cudaD_vec_sum(int Gr, int Bl, double* v, double* value, int dim, int inc); cudamatrix/cu-kernels-ansi.h:void cudaF_vec_sum(int Gr, int Bl, float* v, float* value, int dim, int inc); cudamatrix/cu-kernels-ansi.h:void cuda_compress_int16(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels-ansi.h:void cuda_compress_uint16(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels-ansi.h:void cuda_compress_uint8(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels-ansi.h:void cuda_compress_int8(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels-ansi.h:void cuda_compress_uint8_sign(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels-ansi.h:void cuda_uncompress_int16(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-kernels-ansi.h:void cuda_uncompress_uint16(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-kernels-ansi.h:void cuda_uncompress_int8(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-kernels-ansi.h:void cuda_uncompress_uint8(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-kernels.h:inline void cuda_add_col_sum_mat(int Gr, int Bl, double* result, cudamatrix/cu-kernels.h:inline void cuda_add_col_sum_mat(int Gr, int Bl, float* result, cudamatrix/cu-kernels.h:inline void cuda_add_cols(dim3 Gr, dim3 Bl, double* dst, const double* src, cudamatrix/cu-kernels.h:inline void cuda_add_cols(dim3 Gr, dim3 Bl, float* dst, const float* src, cudamatrix/cu-kernels.h:inline void cuda_add_diag_mat_mat_MN(dim3 Gr, dim3 Bl, const double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_diag_mat_mat_MN(dim3 Gr, dim3 Bl, const float alpha, cudamatrix/cu-kernels.h:inline void cuda_add_diag_mat_mat_MNT(int Gr, int Bl, const double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_diag_mat_mat_MNT(int Gr, int Bl, const float alpha, cudamatrix/cu-kernels.h:inline void cuda_add_diag_mat_mat_MTN(dim3 Gr, dim3 Bl, const double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_diag_mat_mat_MTN(dim3 Gr, dim3 Bl, const float alpha, cudamatrix/cu-kernels.h:inline void cuda_add_diag_packed(int Gr, int Bl, double* mat, double value, cudamatrix/cu-kernels.h:inline void cuda_add_diag_packed(int Gr, int Bl, float* mat, float value, cudamatrix/cu-kernels.h:inline void cuda_add_diag_vec_mat(dim3 Gr, dim3 Bl, double alpha, double *mat, cudamatrix/cu-kernels.h:inline void cuda_add_diag_vec_mat(dim3 Gr, dim3 Bl, float alpha, float *mat, cudamatrix/cu-kernels.h:inline void cuda_add(dim3 Gr, dim3 Bl, double *mat, double value, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_add(dim3 Gr, dim3 Bl, float *mat, float value, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_add_mat_blockmat(dim3 Gr, dim3 Bl, double *data, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_add_mat_blockmat(dim3 Gr, dim3 Bl, float *data, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_add_mat_blocks(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_mat_blocks(dim3 Gr, dim3 Bl, float alpha, const float *src, cudamatrix/cu-kernels.h:inline void cuda_add_mat_repeated(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_mat_repeated(dim3 Gr, dim3 Bl, float alpha, cudamatrix/cu-kernels.h:inline void cuda_add_mat_diag_vec(dim3 Gr, dim3 Bl, double alpha, double *mat, cudamatrix/cu-kernels.h:inline void cuda_add_mat_diag_vec(dim3 Gr, dim3 Bl, float alpha, float *mat, cudamatrix/cu-kernels.h:inline void cuda_add_mat(dim3 Gr, dim3 Bl, double alpha, const double *src, cudamatrix/cu-kernels.h:inline void cuda_add_mat(dim3 Gr, dim3 Bl, float alpha, const float *src, cudamatrix/cu-kernels.h:inline void cuda_add_mat_mat_elements(dim3 Gr, dim3 Bl, double *data, cudamatrix/cu-kernels.h:inline void cuda_add_mat_mat_elements(dim3 Gr, dim3 Bl, float *data, cudamatrix/cu-kernels.h:inline void cuda_add_row_ranges(dim3 Gr, dim3 Bl, double *data, MatrixDim dim, cudamatrix/cu-kernels.h:inline void cuda_add_row_ranges(dim3 Gr, dim3 Bl, float *data, MatrixDim dim, cudamatrix/cu-kernels.h:inline void cuda_add_rows(dim3 Gr, dim3 Bl, double alpha, double* dst, cudamatrix/cu-kernels.h:inline void cuda_add_rows(dim3 Gr, dim3 Bl, float alpha, float* dst, cudamatrix/cu-kernels.h:inline void cuda_add_rows(dim3 Gr, dim3 Bl, double alpha, double* dst, cudamatrix/cu-kernels.h:inline void cuda_add_rows(dim3 Gr, dim3 Bl, float alpha, float* dst, cudamatrix/cu-kernels.h:inline void cuda_mul_rows(dim3 Gr, dim3 Bl, double* dst, cudamatrix/cu-kernels.h:inline void cuda_mul_rows(dim3 Gr, dim3 Bl, float* dst, cudamatrix/cu-kernels.h:inline void cuda_add_smat(dim3 Gr, dim3 Bl, double* mat, MatrixDim mat_dim, cudamatrix/cu-kernels.h:inline void cuda_add_smat(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels.h:inline void cuda_add_smat_trans(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels.h:inline void cuda_add_smat_trans(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels.h:inline void cuda_add_to_rows(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_to_rows(dim3 Gr, dim3 Bl, float alpha, float* const * dst, cudamatrix/cu-kernels.h:inline void cuda_add_to_rows(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_to_rows(dim3 Gr, dim3 Bl, float alpha, cudamatrix/cu-kernels.h:inline void cuda_add_vec2(dim3 Gr, dim3 Bl, double *mat, const double *vec, cudamatrix/cu-kernels.h:inline void cuda_add_vec2(dim3 Gr, dim3 Bl, float *mat, const float *vec, cudamatrix/cu-kernels.h:inline void cuda_add_vec_to_cols(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_vec_to_cols(dim3 Gr, dim3 Bl, float alpha, cudamatrix/cu-kernels.h:inline void cuda_add_vec_to_rows(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels.h:inline void cuda_add_vec_to_rows(dim3 Gr, dim3 Bl, float alpha, cudamatrix/cu-kernels.h:inline void cuda_add_vec_vec(int Gr, int Bl, double alpha, double* v, cudamatrix/cu-kernels.h:inline void cuda_add_vec_vec(int Gr, int Bl, float alpha, float* v, cudamatrix/cu-kernels.h:inline void cuda_apply_ceiling(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels.h:inline void cuda_apply_ceiling(dim3 Gr, dim3 Bl, float* mat, float ceiling_val, cudamatrix/cu-kernels.h:inline void cuda_apply_exp(dim3 Gr, dim3 Bl, double* mat, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_apply_exp(dim3 Gr, dim3 Bl, float* mat, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_apply_exp_limited(dim3 Gr, dim3 Bl, double* mat, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_apply_exp_limited(dim3 Gr, dim3 Bl, float* mat, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_apply_exp_special(dim3 Gr, dim3 Bl, double* out, cudamatrix/cu-kernels.h:inline void cuda_apply_exp_special(dim3 Gr, dim3 Bl, float* out, cudamatrix/cu-kernels.h:inline void cuda_apply_floor(dim3 Gr, dim3 Bl, double* mat, double floor_val, cudamatrix/cu-kernels.h:inline void cuda_apply_floor(dim3 Gr, dim3 Bl, float* mat, float floor_val, cudamatrix/cu-kernels.h:inline void cuda_apply_heaviside(dim3 Gr, dim3 Bl, double* mat, MatrixDim dim) { cudamatrix/cu-kernels.h:inline void cuda_apply_heaviside(dim3 Gr, dim3 Bl, float* mat, MatrixDim dim) { cudamatrix/cu-kernels.h:inline void cuda_apply_log(dim3 Gr, dim3 Bl, double *mat, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_apply_log(dim3 Gr, dim3 Bl, float *mat, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_apply_pow_abs(dim3 Gr, dim3 Bl, double* mat, double power, cudamatrix/cu-kernels.h:inline void cuda_apply_pow_abs(dim3 Gr, dim3 Bl, float* mat, float power, cudamatrix/cu-kernels.h:inline void cuda_apply_pow(dim3 Gr, dim3 Bl, double* mat, double power, cudamatrix/cu-kernels.h:inline void cuda_apply_pow(dim3 Gr, dim3 Bl, float* mat, float power, cudamatrix/cu-kernels.h:inline void cuda_block_add_mat_mat(dim3 Gr, dim3 Bl, cudamatrix/cu-kernels.h:inline void cuda_block_add_mat_mat(dim3 Gr, dim3 Bl, cudamatrix/cu-kernels.h:inline void cuda_calc_group_max_deriv(dim3 Gr, dim3 Bl, double *y, cudamatrix/cu-kernels.h:inline void cuda_calc_group_max_deriv(dim3 Gr, dim3 Bl, float *y, cudamatrix/cu-kernels.h:inline void cuda_comp_obj_deriv(dim3 Gr, dim3 Bl, MatrixElement* x, cudamatrix/cu-kernels.h:inline void cuda_comp_obj_deriv(dim3 Gr, dim3 Bl, MatrixElement* x, cudamatrix/cu-kernels.h:inline void cuda_copy_col_from_mat_df(int Gr, int Bl, double* v, int col, cudamatrix/cu-kernels.h:inline void cuda_copy_col_from_mat_df(int Gr, int Bl, double* v, int col, cudamatrix/cu-kernels.h:inline void cuda_copy_col_from_mat_fd(int Gr, int Bl, float* v, int col, cudamatrix/cu-kernels.h:inline void cuda_copy_col_from_mat_fd(int Gr, int Bl, float* v, int col, cudamatrix/cu-kernels.h:inline void cuda_copy_cols(dim3 Gr, dim3 Bl, double* dst, const double* src, cudamatrix/cu-kernels.h:inline void cuda_copy_cols(dim3 Gr, dim3 Bl, float* dst, const float* src, cudamatrix/cu-kernels.h:inline void cuda_copy_cols_from_vec(dim3 Gr, dim3 Bl, double *mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_cols_from_vec(dim3 Gr, dim3 Bl, float *mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_copy(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat(dim3 Gr, dim3 Bl, double* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat(dim3 Gr, dim3 Bl, double* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat(dim3 Gr, dim3 Bl, float* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat(dim3 Gr, dim3 Bl, float* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat_trans(dim3 Gr, dim3 Bl, double* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat_trans(dim3 Gr, dim3 Bl, double* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat_trans(dim3 Gr, dim3 Bl, float* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_mat_trans(dim3 Gr, dim3 Bl, float* mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat(dim3 Gr, dim3 Bl, float* mat, MatrixDim mat_dim, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat_trans(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat_trans(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat_trans(dim3 Gr, dim3 Bl, float* mat, cudamatrix/cu-kernels.h:inline void cuda_copy_from_smat_trans(dim3 Gr, dim3 Bl, float* mat, cudamatrix/cu-kernels.h:inline void cuda_copy_from_sp(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels.h:inline void cuda_copy_from_sp(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp(dim3 Gr, dim3 Bl, double* A, const double* B, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp(dim3 Gr, dim3 Bl, double* A, const float* B, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp(dim3 Gr, dim3 Bl, float* A, const double* B, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp(dim3 Gr, dim3 Bl, float* A, const float* B, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp_trans(dim3 Gr, dim3 Bl, double* A, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp_trans(dim3 Gr, dim3 Bl, double* A, const float* B, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp_trans(dim3 Gr, dim3 Bl, float* A, const double* B, cudamatrix/cu-kernels.h:inline void cuda_copy_from_tp_trans(dim3 Gr, dim3 Bl, float* A, const float* B, cudamatrix/cu-kernels.h:inline void cuda_copy_low_upp(dim3 Gr, dim3 Bl, double* A, MatrixDim dimA) { cudamatrix/cu-kernels.h:inline void cuda_copy_low_upp(dim3 Gr, dim3 Bl, float* A, MatrixDim dimA) { cudamatrix/cu-kernels.h:inline void cuda_copy_rows(dim3 Gr, dim3 Bl, double* dst, cudamatrix/cu-kernels.h:inline void cuda_copy_rows(dim3 Gr, dim3 Bl, double* dst, const double* src, cudamatrix/cu-kernels.h:inline void cuda_copy_rows(dim3 Gr, dim3 Bl, float* dst, cudamatrix/cu-kernels.h:inline void cuda_copy_rows(dim3 Gr, dim3 Bl, float* dst, const float* src, cudamatrix/cu-kernels.h:inline void cuda_copy_rows_from_vec(dim3 Gr, dim3 Bl, double *mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_rows_from_vec(dim3 Gr, dim3 Bl, float *mat_out, cudamatrix/cu-kernels.h:inline void cuda_copy_to_rows(dim3 Gr, dim3 Bl, double* const * dst, cudamatrix/cu-kernels.h:inline void cuda_copy_to_rows(dim3 Gr, dim3 Bl, float* const * dst, cudamatrix/cu-kernels.h:inline void cuda_copy_upp_low(dim3 Gr, dim3 Bl, double* A, MatrixDim dimA) { cudamatrix/cu-kernels.h:inline void cuda_copy_upp_low(dim3 Gr, dim3 Bl, float* A, MatrixDim dimA) { cudamatrix/cu-kernels.h:inline void cuda_diff_group_pnorm(dim3 Gr, dim3 Bl, double *id, cudamatrix/cu-kernels.h:inline void cuda_diff_group_pnorm(dim3 Gr, dim3 Bl, float *id, const float *iv, cudamatrix/cu-kernels.h:inline void cuda_diff_log_softmax(dim3 Gr, dim3 Bl, cudamatrix/cu-kernels.h:inline void cuda_diff_log_softmax(dim3 Gr, dim3 Bl, cudamatrix/cu-kernels.h:inline void cuda_diff_lstm_nonlinearity(dim3 Gr, dim3 Bl, const int cell_dim, cudamatrix/cu-kernels.h:inline void cuda_diff_lstm_nonlinearity(dim3 Gr, dim3 Bl, const int cell_dim, cudamatrix/cu-kernels.h:inline void cuda_diff_normalize_per_row(size_t Gr, size_t Bl, double *id, cudamatrix/cu-kernels.h:inline void cuda_diff_normalize_per_row(size_t Gr, size_t Bl, float *id, cudamatrix/cu-kernels.h:inline void cuda_diff_parametric_relu(dim3 Gr, dim3 Bl, double *eout, cudamatrix/cu-kernels.h:inline void cuda_diff_parametric_relu(dim3 Gr, dim3 Bl, float *eout, cudamatrix/cu-kernels.h:inline void cuda_diff_sigmoid(dim3 Gr, dim3 Bl, double *eout, const double *e, cudamatrix/cu-kernels.h:inline void cuda_diff_sigmoid(dim3 Gr, dim3 Bl, float *eout, const float *e, cudamatrix/cu-kernels.h:inline void cuda_diff_softmax(dim3 Gr, dim3 Bl, double* x, const MatrixDim dim, cudamatrix/cu-kernels.h:inline void cuda_diff_softmax(dim3 Gr, dim3 Bl, float* x, const MatrixDim dim, cudamatrix/cu-kernels.h:inline void cuda_diff_tanh(dim3 Gr, dim3 Bl, double *eout, const double *e, cudamatrix/cu-kernels.h:inline void cuda_diff_tanh(dim3 Gr, dim3 Bl, float *eout, const float *e, cudamatrix/cu-kernels.h:inline void cuda_ensure_nonzero(dim3 Gr, dim3 Bl, const double *x, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_ensure_nonzero(dim3 Gr, dim3 Bl, const float *x, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_diff_xent(dim3 Gr, dim3 Bl, const int32_cuda *vec_tgt, cudamatrix/cu-kernels.h:inline void cuda_diff_xent(dim3 Gr, dim3 Bl, const int32_cuda *vec_tgt, cudamatrix/cu-kernels.h:inline void cuda_div_elements(dim3 Gr, dim3 Bl, double *mat, const double *A, cudamatrix/cu-kernels.h:inline void cuda_div_elements(dim3 Gr, dim3 Bl, float *mat, const float *A, cudamatrix/cu-kernels.h:inline void cuda_div_rows_vec(dim3 Gr, dim3 Bl, double *mat, cudamatrix/cu-kernels.h:inline void cuda_div_rows_vec(dim3 Gr, dim3 Bl, float *mat, cudamatrix/cu-kernels.h:inline void cuda_equal_element_mask(dim3 Gr, dim3 Bl, const double *mat1, cudamatrix/cu-kernels.h:inline void cuda_equal_element_mask(dim3 Gr, dim3 Bl, const float *mat1, cudamatrix/cu-kernels.h:inline void cuda_find_row_max_id(dim3 Gr, dim3 Bl, const double *mat, cudamatrix/cu-kernels.h:inline void cuda_find_row_max_id(dim3 Gr, dim3 Bl, const float *mat, cudamatrix/cu-kernels.h:inline void cuda_group_max(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_group_max(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_group_pnorm(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_group_pnorm(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_group_spec_pnorm(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_group_spec_pnorm(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_heaviside(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_heaviside(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_invert_elements(dim3 Gr, dim3 Bl, double *data, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_invert_elements(dim3 Gr, dim3 Bl, float *data, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_log_softmax_reduce(size_t Gr, size_t Bl, double *y, cudamatrix/cu-kernels.h:inline void cuda_log_softmax_reduce(size_t Gr, size_t Bl, float *y, cudamatrix/cu-kernels.h:inline void cuda_lstm_nonlinearity(dim3 Gr, dim3 Bl, const double* in, cudamatrix/cu-kernels.h:inline void cuda_lstm_nonlinearity(dim3 Gr, dim3 Bl, const float* in, cudamatrix/cu-kernels.h:inline void cuda_matrix_add_elements(dim3 Gr, dim3 Bl, double *data, cudamatrix/cu-kernels.h:inline void cuda_matrix_add_elements(dim3 Gr, dim3 Bl, float *data, cudamatrix/cu-kernels.h:inline void cuda_matrix_add_indexed_values(dim3 Gr, dim3 Bl, MatrixDim dim, cudamatrix/cu-kernels.h:inline void cuda_matrix_add_indexed_values(dim3 Gr, dim3 Bl, MatrixDim dim, cudamatrix/cu-kernels.h:inline void cuda_matrix_add_to_elements(dim3 Gr, dim3 Bl, double alpha, cudamatrix/cu-kernels.h:inline void cuda_matrix_add_to_elements(dim3 Gr, dim3 Bl, float alpha, cudamatrix/cu-kernels.h:inline void cuda_matrix_lookup(dim3 Gr, dim3 Bl, const double *data, cudamatrix/cu-kernels.h:inline void cuda_matrix_lookup(dim3 Gr, dim3 Bl, const float *data, cudamatrix/cu-kernels.h:inline void cuda_vector_copy_elements(dim3 Gr, dim3 Bl, double *data, int dim, cudamatrix/cu-kernels.h:inline void cuda_vector_copy_elements(dim3 Gr, dim3 Bl, float *data, int dim, cudamatrix/cu-kernels.h:inline void cuda_max(dim3 Gr, dim3 Bl, double *mat, const double *A, cudamatrix/cu-kernels.h:inline void cuda_max(dim3 Gr, dim3 Bl, float *mat, const float *A, cudamatrix/cu-kernels.h:inline void cuda_max_mat_cols(int Gr, int Bl, double* result, const double* mat, cudamatrix/cu-kernels.h:inline void cuda_max_mat_cols(int Gr, int Bl, float* result, const float* mat, cudamatrix/cu-kernels.h:inline void cuda_min(dim3 Gr, dim3 Bl, double *mat, const double *other, cudamatrix/cu-kernels.h:inline void cuda_min(dim3 Gr, dim3 Bl, float *mat, const float *other, cudamatrix/cu-kernels.h:inline void cuda_min_mat_cols(int Gr, int Bl, double* result, const double* mat, cudamatrix/cu-kernels.h:inline void cuda_min_mat_cols(int Gr, int Bl, float* result, const float* mat, cudamatrix/cu-kernels.h:inline void cuda_mul_cols_vec(dim3 Gr, dim3 Bl, double *mat, cudamatrix/cu-kernels.h:inline void cuda_mul_cols_vec(dim3 Gr, dim3 Bl, float *mat, const float *scale, cudamatrix/cu-kernels.h:inline void cuda_mul_elements(dim3 Gr, dim3 Bl, double *mat, const double *A, cudamatrix/cu-kernels.h:inline void cuda_mul_elements(dim3 Gr, dim3 Bl, float *mat, const float *A, cudamatrix/cu-kernels.h:inline void cuda_mul_rows_group_mat(dim3 Gr, dim3 Bl, double *y, cudamatrix/cu-kernels.h:inline void cuda_mul_rows_group_mat(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_mul_rows_vec(dim3 Gr, dim3 Bl, double *mat, cudamatrix/cu-kernels.h:inline void cuda_mul_rows_vec(dim3 Gr, dim3 Bl, float *mat, const float *scale, cudamatrix/cu-kernels.h:inline void cuda_normalize_per_row(size_t Gr, size_t Bl, double *y, cudamatrix/cu-kernels.h:inline void cuda_normalize_per_row(size_t Gr, size_t Bl, float *y, int y_stride, cudamatrix/cu-kernels.h:inline void cuda_one(int Gr, int Bl, double* x, int dim) { cudamatrix/cu-kernels.h:inline void cuda_one(int Gr, int Bl, float* x, int dim) { cudamatrix/cu-kernels.h:inline void cuda_parametric_relu(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_parametric_relu(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_randomize(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_randomize(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_regularize_l1(dim3 Gr, dim3 Bl, double *wei, double *grad, cudamatrix/cu-kernels.h:inline void cuda_regularize_l1(dim3 Gr, dim3 Bl, float *wei, float *grad, cudamatrix/cu-kernels.h:inline void cuda_replace_value(int Gr, int Bl, double *v, int dim, double orig, cudamatrix/cu-kernels.h:inline void cuda_replace_value(int Gr, int Bl, float *v, int dim, float orig, cudamatrix/cu-kernels.h:inline void cuda_scale_diag_packed(int Gr, int Bl, double* mat, double value, cudamatrix/cu-kernels.h:inline void cuda_scale_diag_packed(int Gr, int Bl, float* mat, float value, cudamatrix/cu-kernels.h:inline void cuda_scale(dim3 Gr, dim3 Bl, double *mat, double value, cudamatrix/cu-kernels.h:inline void cuda_scale(dim3 Gr, dim3 Bl, float *mat, float value, MatrixDim d) { cudamatrix/cu-kernels.h:inline void cuda_select_rows(dim3 Gr, dim3 Bl, const int* out_row_ptr, cudamatrix/cu-kernels.h:inline void cuda_select_rows(dim3 Gr, dim3 Bl, const int* out_row_ptr, cudamatrix/cu-kernels.h:inline void cuda_set_bias_params(int Gr, int Bl, double* v, const double* a, cudamatrix/cu-kernels.h:inline void cuda_set_bias_params(int Gr, int Bl, float* v, const float* a, cudamatrix/cu-kernels.h:inline void cuda_set_const(dim3 Gr, dim3 Bl, double *mat, double value, cudamatrix/cu-kernels.h:inline void cuda_set_const(dim3 Gr, dim3 Bl, float *mat, float value, cudamatrix/cu-kernels.h:inline void cuda_set_diag(int Gr, int Bl, double* mat, double value, cudamatrix/cu-kernels.h:inline void cuda_set_diag(int Gr, int Bl, float* mat, float value, cudamatrix/cu-kernels.h:inline void cuda_set_diag_packed(int Gr, int Bl, double* mat, double value, cudamatrix/cu-kernels.h:inline void cuda_set_diag_packed(int Gr, int Bl, float* mat, float value, cudamatrix/cu-kernels.h:inline void cuda_set_mat_mat_div_mat(dim3 Gr, dim3 Bl, const double *A, cudamatrix/cu-kernels.h:inline void cuda_set_mat_mat_div_mat(dim3 Gr, dim3 Bl, const float *A, cudamatrix/cu-kernels.h:inline void cuda_set_zero_above_diag(dim3 Gr, dim3 Bl, double* mat, cudamatrix/cu-kernels.h:inline void cuda_set_zero_above_diag(dim3 Gr, dim3 Bl, float* mat, cudamatrix/cu-kernels.h:inline void cuda_sequence(dim3 Gr, dim3 Bl, int32_cuda* data, int length, cudamatrix/cu-kernels.h:inline void cuda_sigmoid(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_sigmoid(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_soft_hinge(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_soft_hinge(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_softmax_reduce(size_t Gr, size_t Bl, double *y, cudamatrix/cu-kernels.h:inline void cuda_softmax_reduce(size_t Gr, size_t Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_splice(dim3 Gr, dim3 Bl, double *y, const double *x, cudamatrix/cu-kernels.h:inline void cuda_splice(dim3 Gr, dim3 Bl, float *y, const float *x, cudamatrix/cu-kernels.h:inline void cuda_sum_column_ranges(dim3 Gr, dim3 Bl, double *data, cudamatrix/cu-kernels.h:inline void cuda_sum_column_ranges(dim3 Gr, dim3 Bl, float *data, MatrixDim dim, cudamatrix/cu-kernels.h:inline void cuda_sum_mat_cols(int Gr, int Bl, double* result, const double* mat, cudamatrix/cu-kernels.h:inline void cuda_sum_mat_cols(int Gr, int Bl, float* result, const float* mat, cudamatrix/cu-kernels.h:inline void cuda_sy_add_tr2(dim3 Gr, dim3 Bl, double alpha, double beta, cudamatrix/cu-kernels.h:inline void cuda_sy_add_tr2(dim3 Gr, dim3 Bl, float alpha, float beta, cudamatrix/cu-kernels.h:inline void cuda_take_lower(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels.h:inline void cuda_take_lower(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels.h:inline void cuda_take_mean(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels.h:inline void cuda_take_mean(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels.h:inline void cuda_take_upper(dim3 Gr, dim3 Bl, const double* x, double* y, cudamatrix/cu-kernels.h:inline void cuda_take_upper(dim3 Gr, dim3 Bl, const float* x, float* y, cudamatrix/cu-kernels.h:inline void cuda_tanh(dim3 Gr, dim3 Bl, double *y, const double *x, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_tanh(dim3 Gr, dim3 Bl, float *y, const float *x, MatrixDim d, cudamatrix/cu-kernels.h:inline void cuda_trace(int Gr, int Bl, double* mat, double* value, int dim) { cudamatrix/cu-kernels.h:inline void cuda_trace(int Gr, int Bl, float* mat, float* value, int dim) { cudamatrix/cu-kernels.h:inline void cuda_trace_mat_mat(dim3 Gr, dim3 Bl, const double* A, cudamatrix/cu-kernels.h:inline void cuda_trace_mat_mat(dim3 Gr, dim3 Bl, const float* A, const float* B, cudamatrix/cu-kernels.h:inline void cuda_trace_mat_mat_trans(dim3 Gr, dim3 Bl, const double* A, cudamatrix/cu-kernels.h:inline void cuda_trace_mat_mat_trans(dim3 Gr, dim3 Bl, const float* A, cudamatrix/cu-kernels.h:inline void cuda_trace_mat_smat(dim3 Gr, dim3 Bl, const double* mat, cudamatrix/cu-kernels.h:inline void cuda_trace_mat_smat(dim3 Gr, dim3 Bl, const float* mat, cudamatrix/cu-kernels.h:inline void cuda_trace_mat_smat_trans(dim3 Gr, dim3 Bl, const double* mat, cudamatrix/cu-kernels.h:inline void cuda_trace_mat_smat_trans(dim3 Gr, dim3 Bl, const float* mat, cudamatrix/cu-kernels.h:inline void cuda_vec_apply_ceiling(int Gr, int Bl, double* v, double floor_val, cudamatrix/cu-kernels.h:inline void cuda_vec_apply_ceiling(int Gr, int Bl, float* v, float floor_val, cudamatrix/cu-kernels.h:inline void cuda_vec_apply_exp(int Gr, int Bl, double* v, int dim) { cudamatrix/cu-kernels.h:inline void cuda_vec_apply_exp(int Gr, int Bl, float* v, int dim) { cudamatrix/cu-kernels.h:inline void cuda_vec_apply_floor(int Gr, int Bl, double* v, double floor_val, cudamatrix/cu-kernels.h:inline void cuda_vec_apply_floor(int Gr, int Bl, float* v, float floor_val, cudamatrix/cu-kernels.h:inline void cuda_vec_apply_log(int Gr, int Bl, double* v, double* flag, cudamatrix/cu-kernels.h:inline void cuda_vec_apply_log(int Gr, int Bl, float* v, float* flag, int dim) { cudamatrix/cu-kernels.h:inline void cuda_vec_copy_diag_from_packed(int Gr, int Bl, double *dst, cudamatrix/cu-kernels.h:inline void cuda_vec_copy_diag_from_packed(int Gr, int Bl, float *dst, cudamatrix/cu-kernels.h:inline void cuda_vec_max(int Gr, int Bl, const double* v, double* value, cudamatrix/cu-kernels.h:inline void cuda_vec_max(int Gr, int Bl, const float* v, float* value, int dim, cudamatrix/cu-kernels.h:inline void cuda_vec_min(int Gr, int Bl, const double* v, double* value, cudamatrix/cu-kernels.h:inline void cuda_vec_min(int Gr, int Bl, const float* v, float* value, int dim, cudamatrix/cu-kernels.h:inline void cuda_vec_mul_elements(int Gr, int Bl, double* v, const double* a, cudamatrix/cu-kernels.h:inline void cuda_vec_mul_elements(int Gr, int Bl, float* v, const float* a, cudamatrix/cu-kernels.h:inline void cuda_vec_soft_max(int Gr, int Bl, double* v, int dim) { cudamatrix/cu-kernels.h:inline void cuda_vec_soft_max(int Gr, int Bl, float* v, int dim) { cudamatrix/cu-kernels.h:inline void cuda_vec_sum(int Gr, int Bl, double* v, double* value, int dim, cudamatrix/cu-kernels.h:inline void cuda_vec_sum(int Gr, int Bl, float* v, float* value, int dim, cudamatrix/cu-kernels.h:inline void cuda_mat_compress_sign(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels.h:// to avoid compilation errors. cudamatrix/cu-kernels.h:inline void cuda_mat_compress_sign(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels.h:inline void cuda_mat_compress(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels.h:inline void cuda_mat_compress(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels.h:inline void cuda_mat_compress(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels.h:inline void cuda_mat_compress(dim3 Gr, dim3 Bl, const BaseFloat *src, cudamatrix/cu-kernels.h:inline void cuda_mat_uncompress(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-kernels.h:inline void cuda_mat_uncompress(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-kernels.h:inline void cuda_mat_uncompress(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-kernels.h:inline void cuda_mat_uncompress(dim3 Gr, dim3 Bl, BaseFloat *dest, cudamatrix/cu-math.h:void RegularizeL1(CuMatrixBase *weight, CuMatrixBase *gradient, cudamatrix/cu-math.h:void Randomize(const CuMatrixBase &src, cudamatrix/cu-math.h:/// is replaced by src(src.NumRows()-1, j) or src(0, j) respectively, to avoid cudamatrix/cu-math.h:void Splice(const CuMatrixBase &src, cudamatrix/cu-math.h:void Copy(const CuMatrixBase &src, cudamatrix/cu-math.h:void EnsureNonzero(const CuMatrixBase &src, cudamatrix/cu-math.h:void EnsureNonzero(const CuVectorBase &src, cudamatrix/cu-math.h:void ComputeLstmNonlinearity(const CuMatrixBase &input, cudamatrix/cu-math.h:void CpuComputeLstmNonlinearity(const MatrixBase &input, cudamatrix/cu-math.h:void BackpropLstmNonlinearity(const CuMatrixBase &input, cudamatrix/cu-math.h:void CpuBackpropLstmNonlinearity(const MatrixBase &input, cudamatrix/cu-math.h:/// there is also flooring involved, to avoid division-by-zero cudamatrix/cu-math.h:void NormalizePerRow(const CuMatrixBase& in, const Real target_rms, cudamatrix/cu-math.h:void DiffNormalizePerRow(const CuMatrixBase &in_value, cudamatrix/cu-matrix.h:void AddMatMatBatched(const Real alpha, std::vector* > &C, cudamatrix/cu-matrix.h: void CopyCols(const CuMatrixBase &src, cudamatrix/cu-matrix.h: void AddCols(const CuMatrixBase &src, cudamatrix/cu-matrix.h: void CopyRows(const CuMatrixBase &src, cudamatrix/cu-matrix.h: void CopyRows(const CuArrayBase &src); cudamatrix/cu-matrix.h: void CopyToRows(const CuArrayBase &dst) const; cudamatrix/cu-matrix.h: void AddRows(Real alpha, cudamatrix/cu-matrix.h: void MulRows(const CuMatrixBase &src, cudamatrix/cu-matrix.h: void AddRows(Real alpha, cudamatrix/cu-matrix.h: void AddToRows(Real alpha, cudamatrix/cu-matrix.h: void AddToRows(Real alpha, const CuArrayBase &dst) const; cudamatrix/cu-matrix.h: void SumColumnRanges(const CuMatrixBase &src, cudamatrix/cu-matrix.h: void AddRowRanges(const CuMatrixBase &src, cudamatrix/cu-matrix.h: friend void AddMatMatBatched(const Real alpha, cudamatrix/cu-matrix.h: void AddToDiag(Real value); cudamatrix/cu-matrix.h: void CopyFromMat(const MatrixBase &src, cudamatrix/cu-matrix.h: void CopyFromGeneralMat(const GeneralMatrix &src, cudamatrix/cu-matrix.h: void CopyFromMat(const MatrixBase &src, cudamatrix/cu-matrix.h: void CopyFromSp(const CuSpMatrix &M); cudamatrix/cu-matrix.h: void CopyFromTp(const CuTpMatrix &M, cudamatrix/cu-matrix.h: void CopyFromMat(const CuMatrixBase &M, cudamatrix/cu-matrix.h: void CopyToMat(MatrixBase *dst, cudamatrix/cu-matrix.h: void CopyRowsFromVec(const CuVectorBase &v); cudamatrix/cu-matrix.h: void CopyRowsFromVec(const VectorBase &v); cudamatrix/cu-matrix.h: void CopyColsFromVec(const CuVectorBase &v); cudamatrix/cu-matrix.h: void CopyColFromVec(const CuVectorBase &v, const MatrixIndexT col); cudamatrix/cu-matrix.h: void Sigmoid(const CuMatrixBase &src); cudamatrix/cu-matrix.h: void Heaviside(const CuMatrixBase &src); cudamatrix/cu-matrix.h: void SoftHinge(const CuMatrixBase &src); cudamatrix/cu-matrix.h: void GroupPnorm(const CuMatrixBase &src, Real pow); cudamatrix/cu-matrix.h: void DiffGroupPnorm(const CuMatrixBase &in_value, cudamatrix/cu-matrix.h: void GroupMax(const CuMatrixBase &src); cudamatrix/cu-matrix.h: void GroupMaxDeriv(const CuMatrixBase &input, cudamatrix/cu-matrix.h: void ParametricRelu(const CuMatrixBase &src, cudamatrix/cu-matrix.h: void DiffParametricRelu(const CuMatrixBase &value, cudamatrix/cu-matrix.h: void Tanh(const CuMatrixBase &src); cudamatrix/cu-matrix.h: void DiffSigmoid(const CuMatrixBase &value, cudamatrix/cu-matrix.h: void DiffTanh(const CuMatrixBase &value, cudamatrix/cu-matrix.h: void DiffSoftmaxPerRow(const CuMatrixBase &value, cudamatrix/cu-matrix.h: void DiffLogSoftmaxPerRow(const CuMatrixBase &out_value, cudamatrix/cu-matrix.h: void DiffXent(const CuArrayBase &tgt, cudamatrix/cu-matrix.h: void Cholesky(CuMatrixBase *inv_cholesky = NULL); cudamatrix/cu-matrix.h: void SymInvertPosDef(); cudamatrix/cu-matrix.h: void ApplyPow(Real power); cudamatrix/cu-matrix.h: void ApplyPowAbs(Real power, bool include_sign=false); cudamatrix/cu-matrix.h: void ApplyHeaviside(); cudamatrix/cu-matrix.h: void ApplyFloor(Real floor_val); cudamatrix/cu-matrix.h: void ApplyCeiling(Real ceiling_val); cudamatrix/cu-matrix.h: void ApplyExp(); cudamatrix/cu-matrix.h: void ApplyExpLimited(Real lower_limit, Real upper_limit); cudamatrix/cu-matrix.h: void ApplyExpSpecial(); cudamatrix/cu-matrix.h: /// with attention to avoiding overflow or underflow. cudamatrix/cu-matrix.h: void ApplySoftMaxPerRow(const CuMatrixBase &src); cudamatrix/cu-matrix.h: /// with attention to avoiding overflow or underflow. cudamatrix/cu-matrix.h: void ApplyLogSoftMaxPerRow(const CuMatrixBase &src); cudamatrix/cu-matrix.h: void FindRowMaxId(CuArray *id) const; cudamatrix/cu-matrix.h: void SetZero(); cudamatrix/cu-matrix.h: void Set(Real value); cudamatrix/cu-matrix.h: void Add(Real value); cudamatrix/cu-matrix.h: void SetZeroAboveDiag(); cudamatrix/cu-matrix.h: void Scale(Real value); cudamatrix/cu-matrix.h: void ApplyLog(); cudamatrix/cu-matrix.h: void MulElements(const CuMatrixBase &A); cudamatrix/cu-matrix.h: void DivElements(const CuMatrixBase &A); cudamatrix/cu-matrix.h: void Max(const CuMatrixBase &A); cudamatrix/cu-matrix.h: void Min(const CuMatrixBase &A); cudamatrix/cu-matrix.h: void MulColsVec(const CuVectorBase &scale); cudamatrix/cu-matrix.h: void MulRowsVec(const CuVectorBase &scale); cudamatrix/cu-matrix.h: void MulRowsGroupMat(const CuMatrixBase &src); cudamatrix/cu-matrix.h: void DivRowsVec(const CuVectorBase &div); cudamatrix/cu-matrix.h: void InvertElements(); cudamatrix/cu-matrix.h: void AddMat(Real alpha, const CuMatrixBase &A, cudamatrix/cu-matrix.h: void AddSmat(Real alpha, const CuSparseMatrix &A, cudamatrix/cu-matrix.h: void AddSmatMat(Real alpha, const CuSparseMatrix &A, cudamatrix/cu-matrix.h: void AddMatSmat(Real alpha, const CuMatrixBase &A, cudamatrix/cu-matrix.h: void AddToElements(Real alpha, const CuArrayBase &elements); cudamatrix/cu-matrix.h: void AddMatBlocks(Real alpha, const CuMatrixBase &A, cudamatrix/cu-matrix.h: void AddVecToCols(Real alpha, const CuVectorBase &col, Real beta = 1.0); cudamatrix/cu-matrix.h: void AddVecToRows(Real alpha, const CuVectorBase &row, Real beta = 1.0); cudamatrix/cu-matrix.h: void AddMatMat(Real alpha, const CuMatrixBase &A, MatrixTransposeType transA, cudamatrix/cu-matrix.h: void AddVecVec(Real alpha, const CuVectorBase &x, const CuVectorBase &y); cudamatrix/cu-matrix.h: void SetMatMatDivMat(const CuMatrixBase &A, const CuMatrixBase &B, const CuMatrixBase &C); cudamatrix/cu-matrix.h: void SymAddMat2(const Real alpha, const CuMatrixBase &M, cudamatrix/cu-matrix.h: void AddMatBlock(Real alpha, const CuMatrixBase &A, MatrixTransposeType transA, cudamatrix/cu-matrix.h: void AddDiagVecMat(const Real alpha, const CuVectorBase &v, cudamatrix/cu-matrix.h: void AddMatDiagVec(const Real alpha, cudamatrix/cu-matrix.h: void AddMatMatElements(const Real alpha, cudamatrix/cu-matrix.h: void AddMatSp(const Real alpha, cudamatrix/cu-matrix.h: void AddSpMat(const Real alpha, cudamatrix/cu-matrix.h: void AddTpMat(const Real alpha, cudamatrix/cu-matrix.h: void AddMatTp(const Real alpha, cudamatrix/cu-matrix.h: void CopyFromBlock(const CuBlockMatrix &B, cudamatrix/cu-matrix.h: void CopyLowerToUpper(); cudamatrix/cu-matrix.h: void CopyUpperToLower(); cudamatrix/cu-matrix.h: void SetRandn(); cudamatrix/cu-matrix.h: void SetRandUniform(); cudamatrix/cu-matrix.h: void Write(std::ostream &os, bool binary) const; cudamatrix/cu-matrix.h: void AddElements(Real alpha, const std::vector >& input); cudamatrix/cu-matrix.h: void AddElements(Real alpha, const CuArrayBase &indexes, cudamatrix/cu-matrix.h: void Lookup(const std::vector &indexes, cudamatrix/cu-matrix.h: void Lookup(const CuArrayBase &indexes, cudamatrix/cu-matrix.h: void EqualElementMask(const CuMatrixBase &mat, CuMatrix *mask) const; cudamatrix/cu-matrix.h: void Transpose(); cudamatrix/cu-matrix.h: void Resize(MatrixIndexT rows, MatrixIndexT cols, cudamatrix/cu-matrix.h: void Swap(Matrix *mat); cudamatrix/cu-matrix.h: void Swap(CuMatrix *mat); cudamatrix/cu-matrix.h: void Swap(CuMatrix *mat); cudamatrix/cu-matrix.h: void Read(std::istream &is, bool binary); cudamatrix/cu-matrix.h: void CompObjfAndDeriv(const std::vector > &elements, cudamatrix/cu-matrix.h: void Destroy(); cudamatrix/cu-matrix.h:inline void AssertEqual(const CuMatrixBase &A, cudamatrix/cu-matrix.h:void MatrixBase::CopyFromMat(const CuMatrixBase &cu, cudamatrix/cu-matrixdim.h: void *matrix_data; // data for M_i. This is a pointer to either float* or cudamatrix/cu-matrixdim.h: // avoid extra coding to support the two cases, we cudamatrix/cu-matrixdim.h: // decided to make this a void* pointer. cudamatrix/cu-packed-matrix.h: void SetZero(); /// < Set to zero cudamatrix/cu-packed-matrix.h: void SetUnit(); /// < Set to unit matrix. cudamatrix/cu-packed-matrix.h: void SetRandn(); /// < Set to random values of a normal distribution cudamatrix/cu-packed-matrix.h: void SetDiag(Real alpha); /// < Set the diagonal value to alpha cudamatrix/cu-packed-matrix.h: void AddToDiag(Real r); ///< Add this quantity to the diagonal of the matrix. cudamatrix/cu-packed-matrix.h: void Scale(Real alpha); cudamatrix/cu-packed-matrix.h: void ScaleDiag(Real alpha); cudamatrix/cu-packed-matrix.h: void Resize(MatrixIndexT nRows, MatrixResizeType resize_type = kSetZero); cudamatrix/cu-packed-matrix.h: void CopyFromPacked(const CuPackedMatrix &src); cudamatrix/cu-packed-matrix.h: void CopyFromPacked(const PackedMatrix &src); cudamatrix/cu-packed-matrix.h: void CopyToPacked(PackedMatrix *dst) const; cudamatrix/cu-packed-matrix.h: void Read(std::istream &in, bool binary); cudamatrix/cu-packed-matrix.h: void Write(std::ostream &out, bool binary) const; cudamatrix/cu-packed-matrix.h: void Destroy(); cudamatrix/cu-packed-matrix.h: void Swap(CuPackedMatrix *other); cudamatrix/cu-packed-matrix.h: void Swap(PackedMatrix *other); cudamatrix/cu-packed-matrix.h: void AddPacked(const Real alpha, const CuPackedMatrix &M); cudamatrix/cu-rand.h: void SeedGpu() { cudamatrix/cu-rand.h: void RandUniform(CuMatrixBase *tgt); cudamatrix/cu-rand.h: void RandUniform(CuMatrix *tgt); cudamatrix/cu-rand.h: void RandUniform(CuVectorBase *tgt); cudamatrix/cu-rand.h: void RandGaussian(CuMatrixBase *tgt); cudamatrix/cu-rand.h: void RandGaussian(CuMatrix *tgt); cudamatrix/cu-rand.h: void RandGaussian(CuVectorBase *tgt); cudamatrix/cu-rand.h: void BinarizeProbs(const CuMatrix &probs, CuMatrix *states); cudamatrix/cu-rand.h: void AddGaussNoise(CuMatrix *tgt, Real gscale = 1.0); cudamatrix/cu-sp-matrix.h: inline void Resize(MatrixIndexT nRows, MatrixResizeType resize_type = kSetZero) { cudamatrix/cu-sp-matrix.h: void CopyFromSp(const CuSpMatrix &other) { cudamatrix/cu-sp-matrix.h: void CopyFromSp(const SpMatrix &other) { cudamatrix/cu-sp-matrix.h: void CopyFromMat(const CuMatrixBase &orig, cudamatrix/cu-sp-matrix.h: void CopyToSp(SpMatrix *dst) const { cudamatrix/cu-sp-matrix.h: void Invert(); cudamatrix/cu-sp-matrix.h: void AddVec2(const Real alpha, const CuVectorBase &v); cudamatrix/cu-sp-matrix.h: void AddMat2(const Real alpha, const CuMatrixBase &M, cudamatrix/cu-sp-matrix.h: void AddSp(const Real alpha, const CuSpMatrix &Ma) { cudamatrix/cu-sp-matrix.h:inline void AssertEqual(const CuSpMatrix &A, cudamatrix/cu-sparse-matrix.h: void CopyToMat(CuMatrixBase *dest, MatrixTransposeType trans = cudamatrix/cu-sparse-matrix.h: void CopyFromSmat(const SparseMatrix &smat); cudamatrix/cu-sparse-matrix.h: void CopyFromSmat(const CuSparseMatrix &smat, cudamatrix/cu-sparse-matrix.h: void SelectRows(const CuArray &row_indexes, cudamatrix/cu-sparse-matrix.h: void CopyToSmat(SparseMatrix *smat) const; cudamatrix/cu-sparse-matrix.h: void CopyElementsToVec(CuVectorBase *vec) const; cudamatrix/cu-sparse-matrix.h: void Swap(SparseMatrix *smat); cudamatrix/cu-sparse-matrix.h: void Swap(CuSparseMatrix *smat); cudamatrix/cu-sparse-matrix.h: void SetRandn(BaseFloat zero_prob); cudamatrix/cu-sparse-matrix.h: void Write(std::ostream &os, bool binary) const; cudamatrix/cu-sparse-matrix.h: void Read(std::istream &is, bool binary); cudamatrix/cu-sparse-matrix.h: void Resize(const MatrixIndexT num_rows, const MatrixIndexT num_cols, cudamatrix/cu-sparse-matrix.h: void Destroy(); cudamatrix/cu-tp-matrix.h: void CopyFromMat(const CuMatrixBase &M, cudamatrix/cu-tp-matrix.h: void CopyFromTp(const CuTpMatrix &other) { cudamatrix/cu-tp-matrix.h: void CopyFromTp(const TpMatrix &other) { cudamatrix/cu-tp-matrix.h: void Cholesky(const CuSpMatrix& Orig); cudamatrix/cu-tp-matrix.h: void Invert(); cudamatrix/cu-vector.h: friend void cu::Splice(const CuMatrixBase &src, cudamatrix/cu-vector.h: void CopyFromVec(const CuVectorBase &src); cudamatrix/cu-vector.h: void CopyFromVec(const CuVectorBase &M); cudamatrix/cu-vector.h: void CopyFromVec(const VectorBase &src); cudamatrix/cu-vector.h: void CopyToVec(VectorBase *dst) const; cudamatrix/cu-vector.h: void CopyRowsFromMat(const CuMatrixBase &M); cudamatrix/cu-vector.h: void CopyRowsFromMat(const MatrixBase &M); cudamatrix/cu-vector.h: void SetZero(); cudamatrix/cu-vector.h: void Set(Real value); cudamatrix/cu-vector.h: void Add(Real value); cudamatrix/cu-vector.h: void Scale(Real value); cudamatrix/cu-vector.h: void AddVec(Real alpha, const CuVectorBase &vec, Real beta = 1.0); cudamatrix/cu-vector.h: void AddVec(Real alpha, const CuVectorBase &vec, Real beta = 1.0); cudamatrix/cu-vector.h: void AddRowSumMat(Real alpha, const CuMatrixBase &mat, Real beta = 1.0); cudamatrix/cu-vector.h: void AddColSumMat(Real alpha, const CuMatrixBase &mat, Real beta = 1.0); cudamatrix/cu-vector.h: void AddTpVec(const Real alpha, const CuTpMatrix&M, cudamatrix/cu-vector.h: void MulTp(const CuTpMatrix &M, const MatrixTransposeType trans); cudamatrix/cu-vector.h: void InvertElements(); cudamatrix/cu-vector.h: void CopyElements(const CuMatrixBase &mat, cudamatrix/cu-vector.h: void ApplySoftMax(); cudamatrix/cu-vector.h: void ApplyExp(); cudamatrix/cu-vector.h: void ApplyLog(); cudamatrix/cu-vector.h: void ApplyFloor(Real floor_val, MatrixIndexT *floored_count = NULL); cudamatrix/cu-vector.h: void ApplyCeiling(Real ceiling_val, MatrixIndexT *ceiled_count = NULL); cudamatrix/cu-vector.h: void ApplyPow(Real power); cudamatrix/cu-vector.h: void SetRandn(); cudamatrix/cu-vector.h: void SetRandUniform(); cudamatrix/cu-vector.h: void CopyColFromMat(const CuMatrixBase &mat, MatrixIndexT col); cudamatrix/cu-vector.h: void CopyColFromMat(const CuMatrixBase &mat, MatrixIndexT col); cudamatrix/cu-vector.h: void AddMatVec(const Real alpha, const CuMatrixBase &M, cudamatrix/cu-vector.h: void AddVecVec(Real alpha, const CuVectorBase &v, cudamatrix/cu-vector.h: void AddSpVec(const Real alpha, const CuSpMatrix &S, cudamatrix/cu-vector.h: void AddDiagMat2(Real alpha, const CuMatrixBase &M, cudamatrix/cu-vector.h: void AddDiagMatMat(Real alpha, const CuMatrixBase &M, MatrixTransposeType transM, cudamatrix/cu-vector.h: void CopyDiagFromPacked(const CuPackedMatrix &M); cudamatrix/cu-vector.h: void CopyDiagFromMat(const CuMatrix &M); cudamatrix/cu-vector.h: void ReplaceValue(Real orig, Real changed); cudamatrix/cu-vector.h: void MulElements(const CuVectorBase &v); cudamatrix/cu-vector.h: void DivElements(const CuVectorBase &v); cudamatrix/cu-vector.h: void Resize(MatrixIndexT dim, MatrixResizeType t = kSetZero); cudamatrix/cu-vector.h: void Swap(CuVector *vec); cudamatrix/cu-vector.h: void Swap(Vector *vec); cudamatrix/cu-vector.h: void Read(std::istream &is, bool binary); cudamatrix/cu-vector.h: void Write(std::ostream &is, bool binary) const; cudamatrix/cu-vector.h: void Destroy(); cudamatrix/cu-vector.h:inline void AssertEqual(const CuVectorBase &a, cudamatrix/cu-vector.h:void CuVectorBase::CopyFromVec(const CuVectorBase &v) { cudamatrix/cu-vector.h:void VectorBase::CopyFromVec(const CuVectorBase &cu) { cudamatrix/cu-vector.h:void CuVectorBase::CopyFromVec(const CuVectorBase &src); cudamatrix/cu-vector.h:void CuVectorBase::CopyFromVec(const CuVectorBase &src); ```
kaldi util ```c util/basic-filebuf.h: void swap(basic_filebuf& rhs); util/basic-filebuf.h: void imbue(const std::locale& loc) override; util/basic-filebuf.h: void _M_write_mode(); util/basic-filebuf.h:void util/basic-filebuf.h:void util/basic-filebuf.h: reinterpret_cast(const_cast(_M_extbufnext)), util/basic-filebuf.h:void util/basic-filebuf.h:void util/const-integer-set-inl.h:void ConstIntegerSet::InitInternal() { util/const-integer-set-inl.h:void ConstIntegerSet::Write(std::ostream &os, bool binary) const { util/const-integer-set-inl.h:void ConstIntegerSet::Read(std::istream &is, bool binary) { util/const-integer-set.h: void Init(const std::vector &input) { util/const-integer-set.h: void Init(const std::set &input) { util/const-integer-set.h: void Write(std::ostream &os, bool binary) const; util/const-integer-set.h: void Read(std::istream &is, bool binary); util/const-integer-set.h: void InitInternal(); util/hash-list-inl.h:template void HashList::SetSize(size_t size) { util/hash-list-inl.h:inline void HashList::Delete(Elem *e) { util/hash-list-inl.h:void HashList::Insert(I key, T val) { util/hash-list-inl.h:void HashList::InsertMore(I key, T val) { util/hash-list.h: to avoid repeated new's/deletes. util/hash-list.h: inline void Delete(Elem *e); util/hash-list.h: inline void Insert(I key, T val); util/hash-list.h: inline void InsertMore(I key, T val); util/hash-list.h: void SetSize(size_t sz); util/kaldi-holder-inl.h: void Clear() { util/kaldi-holder-inl.h: void Swap(KaldiObjectHolder *other) { util/kaldi-holder-inl.h: void Clear() { } util/kaldi-holder-inl.h: void Swap(BasicHolder *other) { util/kaldi-holder-inl.h: void Clear() { t_.clear(); } util/kaldi-holder-inl.h: void Swap(BasicVectorHolder *other) { util/kaldi-holder-inl.h: void Clear() { t_.clear(); } util/kaldi-holder-inl.h: void Swap(BasicVectorVectorHolder *other) { util/kaldi-holder-inl.h: void Clear() { t_.clear(); } util/kaldi-holder-inl.h: void Swap(BasicPairVectorHolder *other) { util/kaldi-holder-inl.h: void Clear() { t_.clear(); } util/kaldi-holder-inl.h: void Swap(TokenHolder *other) { util/kaldi-holder-inl.h: void Clear() { t_.clear(); } util/kaldi-holder-inl.h: void Swap(TokenVectorHolder *other) { util/kaldi-holder-inl.h: void Clear() { t_.first.Resize(0, 0); } util/kaldi-holder-inl.h: void Swap(HtkMatrixHolder *other) { util/kaldi-holder-inl.h: void Clear() { feats_.Resize(0, 0); } util/kaldi-holder-inl.h: void Swap(SphinxMatrixHolder *other) { util/kaldi-holder.h: void Clear() { } util/kaldi-holder.h: void Swap(GenericHolder *other) { std::swap(t_, other->t_); } util/kaldi-io.h: /// was already open and could not be closed (to avoid this, call Close() util/kaldi-io.h: // want to avoid exceptions being thrown. There are times when calling util/kaldi-io.h: // boolean argument, to avoid confusion with Kaldi's text/binary distinction, util/kaldi-io.h:template void ReadKaldiObject(const std::string &filename, util/kaldi-io.h:template <> void ReadKaldiObject(const std::string &filename, util/kaldi-io.h:template <> void ReadKaldiObject(const std::string &filename, util/kaldi-io.h:template inline void WriteKaldiObject(const C &c, util/kaldi-semaphore.h: void Wait(); ///< decrease the counter util/kaldi-semaphore.h: void Signal(); ///< increase the counter util/kaldi-table-inl.h: virtual void FreeCurrent() = 0; util/kaldi-table-inl.h: virtual void Next() = 0; util/kaldi-table-inl.h: virtual void SwapHolder(Holder *other_holder) = 0; util/kaldi-table-inl.h: void FreeCurrent() { util/kaldi-table-inl.h: void SwapHolder(Holder *other_holder) { util/kaldi-table-inl.h: // suppressing compiler warnings by casting to void. It will cause the util/kaldi-table-inl.h: (void) Value(); util/kaldi-table-inl.h: // holder_, but it won't matter. We avoid calling Clear() on them, as this util/kaldi-table-inl.h: void Next() { util/kaldi-table-inl.h: void SetErrorState() { util/kaldi-table-inl.h: void NextScpLine() { util/kaldi-table-inl.h: virtual void Next() { util/kaldi-table-inl.h: virtual void FreeCurrent() { util/kaldi-table-inl.h: void SwapHolder(Holder *other_holder) { util/kaldi-table-inl.h: // suppressing compiler warnings by casting to void. util/kaldi-table-inl.h: (void) Value(); util/kaldi-table-inl.h: void RunInBackground() { util/kaldi-table-inl.h: static void run(SequentialTableReaderBackgroundImpl *object) { util/kaldi-table-inl.h: void SwapHolder(Holder *other_holder) { util/kaldi-table-inl.h: virtual void FreeCurrent() { util/kaldi-table-inl.h: virtual void Next() { util/kaldi-table-inl.h:void SequentialTableReader::FreeCurrent() { util/kaldi-table-inl.h:void SequentialTableReader::Next() { util/kaldi-table-inl.h: virtual void Flush() = 0; util/kaldi-table-inl.h: virtual void Flush() { util/kaldi-table-inl.h: virtual void Flush() { } util/kaldi-table-inl.h: void MakeFilename(typename std::ostream::pos_type streampos, util/kaldi-table-inl.h: virtual void Flush() { util/kaldi-table-inl.h:void TableWriter::Write(const std::string &key, util/kaldi-table-inl.h:void TableWriter::Flush() { util/kaldi-table-inl.h:// avoids a lot of fseek operations that might be expensive. util/kaldi-table-inl.h: void ReadNextObject() { util/kaldi-table-inl.h: void HandlePendingDelete() { util/kaldi-table-inl.h: void HandlePendingDelete() { util/kaldi-table-inl.h:void SequentialTableReader::CheckImpl() const { util/kaldi-table-inl.h:void RandomAccessTableReader::CheckImpl() const { util/kaldi-table-inl.h:void TableWriter::CheckImpl() const { util/kaldi-table.h: void CheckImpl() const; // Checks that impl_ is non-NULL; prints an error util/kaldi-table.h: void FreeCurrent(); util/kaldi-table.h: // object in situations where this would avoid making a redundant copy. util/kaldi-table.h: void Next(); util/kaldi-table.h: void CheckImpl() const; // Checks that impl_ is non-NULL; prints an error util/kaldi-table.h: inline void Write(const std::string &key, const T &value) const; util/kaldi-table.h: void Flush(); util/kaldi-table.h: void CheckImpl() const; // Checks that impl_ is non-NULL; prints an error util/kaldi-thread.h: virtual void operator() () = 0; util/kaldi-thread.h: void operator() () { util/kaldi-thread.h:template void RunMultiThreaded(const C &c_in) { util/kaldi-thread.h: void Register(OptionsItf *opts) { util/kaldi-thread.h: void Run(C *c) { util/kaldi-thread.h: void Wait() { // You call this at the end if it's more convenient util/kaldi-thread.h: static void RunTask(RunTaskArgsList *args) { util/parse-options.h: registered with a prefix to avoid conflicts. The object thus created will util/parse-options.h: void Register(const std::string &name, util/parse-options.h: void Register(const std::string &name, util/parse-options.h: void Register(const std::string &name, util/parse-options.h: void Register(const std::string &name, util/parse-options.h: void Register(const std::string &name, util/parse-options.h: void Register(const std::string &name, util/parse-options.h: void DisableOption(const std::string &name); util/parse-options.h: void RegisterStandard(const std::string &name, util/parse-options.h: void PrintUsage(bool print_command_line = false); util/parse-options.h: void PrintConfig(std::ostream &os); util/parse-options.h: void ReadConfigFile(const std::string &filename); util/parse-options.h: void RegisterTmpl(const std::string &name, T *ptr, const std::string &doc); util/parse-options.h: void RegisterSpecific(const std::string &name, const std::string &idx, util/parse-options.h: void RegisterSpecific(const std::string &name, const std::string &idx, util/parse-options.h: void RegisterSpecific(const std::string &name, const std::string &idx, util/parse-options.h: void RegisterSpecific(const std::string &name, const std::string &idx, util/parse-options.h: void RegisterSpecific(const std::string &name, const std::string &idx, util/parse-options.h: void RegisterSpecific(const std::string &name, const std::string &idx, util/parse-options.h: void RegisterCommon(const std::string &name, util/parse-options.h: void SplitLongArg(std::string in, std::string *key, std::string *value, util/parse-options.h: void NormalizeArgName(std::string *str); util/parse-options.h:/// "void Register(OptionsItf *opts)" which it can call to register the util/parse-options.h:template void ReadConfigFromFile(const std::string config_filename, util/parse-options.h:template void ReadConfigsFromFile(const std::string util/simple-options.h: void Register(const std::string &name, bool *ptr, const std::string &doc); util/simple-options.h: void Register(const std::string &name, int32 *ptr, const std::string &doc); util/simple-options.h: void Register(const std::string &name, uint32 *ptr, const std::string &doc); util/simple-options.h: void Register(const std::string &name, float *ptr, const std::string &doc); util/simple-options.h: void Register(const std::string &name, double *ptr, const std::string &doc); util/simple-options.h: void Register(const std::string &name, std::string *ptr, util/stl-utils.h:inline void SortAndUniq(std::vector *vec) { util/stl-utils.h:inline void Uniq(std::vector *vec) { // must be already sorted. util/stl-utils.h:void CopySetToVector(const std::set &s, std::vector *v) { util/stl-utils.h:void CopySetToVector(const unordered_set &s, std::vector *v) { util/stl-utils.h:void CopyMapToVector(const std::map &m, util/stl-utils.h:void CopyMapKeysToVector(const std::map &m, std::vector *v) { util/stl-utils.h:void CopyMapValuesToVector(const std::map &m, std::vector *v) { util/stl-utils.h:void CopyMapKeysToSet(const std::map &m, std::set *s) { util/stl-utils.h:void CopyMapValuesToSet(const std::map &m, std::set *s) { util/stl-utils.h:void CopyVectorToSet(const std::vector &v, std::set *s) { util/stl-utils.h:void DeletePointers(std::vector *v) { util/stl-utils.h:void CopyVectorToVector(const std::vector &vec_in, std::vector *vec_out) { util/stl-utils.h:inline void ReverseVector(std::vector *vec) { util/stl-utils.h:inline void MergePairVectorSumming(std::vector > *vec) { util/stl-utils.h: // initial input (avoids unnecessary copying). util/text-utils.h:void SplitStringToVector(const std::string &full, const char *delim, util/text-utils.h:void JoinVectorToString(const std::vector &vec_in, util/text-utils.h:void Trim(std::string *str); util/text-utils.h:void SplitStringOnFirstSpace(const std::string &line, ```
from darknet/src/blas.c, gemm.c, matrix.c, utils.c ```c void reorg_cpu(float *x, int w, int h, int c, int batch, int stride, int forward, float *out) void flatten(float *x, int size, int layers, int batch, int forward) void weighted_sum_cpu(float *a, float *b, float *s, int n, float *c) void weighted_delta_cpu(float *a, float *b, float *s, float *da, float *db, float *ds, int n, float *dc) void shortcut_cpu(int batch, int w1, int h1, int c1, float *add, int w2, int h2, int c2, float s1, float s2, float *out) void mean_cpu(float *x, int batch, int filters, int spatial, float *mean) void variance_cpu(float *x, float *mean, int batch, int filters, int spatial, float *variance) void l2normalize_cpu(float *x, float *dx, int batch, int filters, int spatial) void normalize_cpu(float *x, float *mean, float *variance, int batch, int filters, int spatial) void const_cpu(int N, float ALPHA, float *X, int INCX) void mul_cpu(int N, float *X, int INCX, float *Y, int INCY) void pow_cpu(int N, float ALPHA, float *X, int INCX, float *Y, int INCY) void axpy_cpu(int N, float ALPHA, float *X, int INCX, float *Y, int INCY) void scal_cpu(int N, float ALPHA, float *X, int INCX) void fill_cpu(int N, float ALPHA, float *X, int INCX) void deinter_cpu(int NX, float *X, int NY, float *Y, int B, float *OUT) void inter_cpu(int NX, float *X, int NY, float *Y, int B, float *OUT) void copy_cpu(int N, float *X, int INCX, float *Y, int INCY) void mult_add_into_cpu(int N, float *X, float *Y, float *Z) void smooth_l1_cpu(int n, float *pred, float *truth, float *delta, float *error) void l1_cpu(int n, float *pred, float *truth, float *delta, float *error) void softmax_x_ent_cpu(int n, float *pred, float *truth, float *delta, float *error) void logistic_x_ent_cpu(int n, float *pred, float *truth, float *delta, float *error) void l2_cpu(int n, float *pred, float *truth, float *delta, float *error) void softmax(float *input, int n, float temp, int stride, float *output) void softmax_cpu(float *input, int n, int batch, int batch_offset, int groups, int group_offset, int stride, float temp, float *output) void upsample_cpu(float *in, int w, int h, int c, int batch, int stride, int forward, float scale, float *out) void gemm_bin(int M, int N, int K, float ALPHA, void time_random_matrix(int TA, int TB, int m, int k, int n) void gemm(int TA, int TB, int M, int N, int K, float ALPHA, void gemm_nn(int M, int N, int K, float ALPHA, void gemm_nt(int M, int N, int K, float ALPHA, void gemm_tn(int M, int N, int K, float ALPHA, void gemm_tt(int M, int N, int K, float ALPHA, void gemm_cpu(int TA, int TB, int M, int N, int K, float ALPHA, void gemm_gpu(int TA, int TB, int M, int N, int K, float ALPHA, void time_gpu_random_matrix(int TA, int TB, int m, int k, int n) void time_gpu(int TA, int TB, int m, int k, int n) void test_gpu_accuracy(int TA, int TB, int m, int k, int n) void cuda_set_device(int n) void check_error(cudaError_t status) cudaError_t status = cudaMalloc((void **)&x_gpu, size); void cuda_random(float *x_gpu, size_t n) cudaError_t status = cudaMalloc((void **)&x_gpu, size); void cuda_free(float *x_gpu) void cuda_push_array(float *x_gpu, float *x, size_t n) void cuda_pull_array(float *x_gpu, float *x, size_t n) void cuda_set_device(int n){} void free_matrix(matrix m) void scale_matrix(matrix m, float scale) void matrix_add_matrix(matrix from, matrix to) void matrix_to_csv(matrix m) void print_matrix(matrix m) void sorta_shuffle(void *arr, size_t n, size_t size, size_t sections) void shuffle(void *arr, size_t n, size_t size) void *swp = calloc(1, size); void del_arg(int argc, char **argv, int index) void pm(int M, int N, float *A) void find_replace(char *str, char *orig, char *rep, char *output) void top_k(float *a, int n, int k, int *index) void error(const char *s) void malloc_error() void file_error(char *s) void strip(char *s) void strip_char(char *s, char bad) void free_ptrs(void **ptrs, int n) void write_int(int fd, int n) void read_all(int fd, char *buffer, size_t bytes) void write_all(int fd, char *buffer, size_t bytes) void mean_arrays(float **a, int n, int els, float *avg) void print_statistics(float *a, int n) void normalize_array(float *a, int n) void translate_array(float *a, int n, float s) void scale_array(float *a, int n, float s) ```

google fast approx fftw

Kandelion commented 6 years ago

BLAS 레퍼런스 CUBLAS 레퍼런스

레퍼런스 PDF 파일입니다. 다음은 웹페이지 링크입니다. BLAS 링크 CUBLAS 링크

kooBH commented 6 years ago

추가적으로 궁금한 사항들

  1. 예외처리 어디까지(정확히) 받아줄것인지 (ex)행렬 차원이 틀렸을때, 사이즈가 안 맞을때, 연산 조건이 안 맞을때(MAT 연산에 conjugate tranpose etc)
  2. blas 연산에서 complex alpha,beta는 인자로 DTYPE,DTPYE or CTYPE ?
  3. batch 연산의 구현 - 따로 만들건지, 함수에서 ndim 받아서 판단하게 할 건지
  4. wav 외에 받아야하는 파일 타입
  5. 3차원 배열은 2차원 배열의 batch로 사용? or 3차원으로 사용?
  6. 슬슬 파일이 많은것 같은데 헤더폴더와 소스폴더로 나누고자 합니다. 머지 이슈가 있을 수 있어서 시간을 맞춰서 해야할 것 같네요
gogyzzz commented 6 years ago

@Kandelion @kooBH 요청 내용은 하나씩 새로운 issue로 올리도록 하겠습니다. 여기 내용은 최신 요청 내용과 다를 수 있습니다

Kandelion commented 6 years ago

dot 함수같은 경우엔 blas도 matrix를 돌려주는것이 아니라 스칼라값을 구해서 리턴해주고 있습니다. 그래도 다른 방식으로 구현한다면 정확히 어떤 연산이 이루어져야 하는 건가요

gogyzzz commented 6 years ago

@Kandelion 1 x 1 matrix 가 나와야 할 것 같습니다. 원래 수학적으로 엄밀히 보면 스칼라가 나오는게 맞는 것 같은데...

Kandelion commented 6 years ago

@gogyzzz 음.. 그럼 리턴값이 MAT*이 되어야 한다는 말씀이신가요? 크기가 1 by 1인?

gogyzzz commented 6 years ago

으으음 으으으음 으으으으음 생각해보니

벡터용 닷은 스칼라를 리턴하고 매트릭스용 맷멀은 1x1을 리턴해야 할 것 같네요 역시 따로따로 잇어야 하겟군요 그대로 두세요

On Wed, Jul 18, 2018, 12:40 PM Kandelion notifications@github.com wrote:

음.. 그럼 리턴값이 MAT*이 되어야 한다는 말씀이신가요? 크기가 1 by 1인?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gogyzzz/iip_sph_pp/issues/37#issuecomment-405799150, or mute the thread https://github.com/notifications/unsubscribe-auth/AHw6d4s1uLPKhxJwaFfMZInNSlPQoF93ks5uHq5GgaJpZM4VOdz9 .

-- Haeyong Kwon Sogang University | Electronic engineering (P): 010-8422-0243 blog: https://gogyzzz.blogspot.kr/

Kandelion commented 6 years ago

@gogyzzz 위에 표에 sum() 함수가 있는데 blas에는 찾아보니 의외로 없네요...;; blas와 관계없이 별도로 구현을 할까요?

+ sdot, cdot, udot 차이점은 입력받는 값이

sdot cdot udot
Input MAT, MAT MAT, CMAT CMAT, CMAT
Output DTYPE CTYPE CTYPE

이렇게 됩니다.

Kandelion commented 6 years ago

image 여러분 wolfram alpha를 잘 이용하면 좋을것 같습니다... 킹프램갓파...

gogyzzz commented 6 years ago

@Kandelion @kooBH 본문 내용을 수정하였습니다.

제가 생각하는 함수는 kaldi library의 그것들과 많이 닮아 있습니다.

kaldi library는 음성인식에 가장 많이 사용되고 있습니다. 음성 데이터를 다룰 수 있는 루틴이 굉장히 많은 편이라 기본 단계는 이것만 참고해도 될 정도입니다.

그리고 kaldi가 c++ 라이브러리이긴 하지만 행렬 연산을 할 때 연산자 오버로딩을 사용하지 않고 함수를 호출하는 방식으로 만들어져 있습니다.

그래서 본문 내용에 kaldi에서 사용되는 기본 matrix 함수들을 추가해봤습니다.

일단은 grep 명령어로 void를 파싱해서 함수 이름만 가져온 것이기 때문에, 인자가 모두 나타나 있지 않은 함수도 있습니다.

그리고 불필요한 함수들은 지워야 하는데 일단은 참고용으로 그대로 가져다 넣었습니다.

더 자세한 내용을 보시려면

https://github.com/kaldi-asr/kaldi/tree/master/src 에서 matrix, base, cudamatrix, util을 보시면 됩니다. 개인적으로는 한번씩 살펴봤으면 합니다.

kooBH commented 6 years ago

// from haeyong @Kandelion 참고

예상 기간일 뿐이고 실제로 의미는 딱히 없습니다. subrange 연산은 보류하겠습니다. ( 앞으로 subrange 연산이 정말 필요한 상황이 있을지 잘 모르겠음. 본혁이에게 submatmul 해보라고 한 이후 다른 함수들에 대해서도 적용하면 조합이 폭발한다는 생각이 듬... 실제 신호처리 알고리즘을 짤 때 맞닥뜨리면 생각하는 것으로 )

180xxx / 고도화 180xxx / backend 성능 테스트 180xxx / mkl, openblas, cublas backend ( cublas는 사실 08/31 전에 적용됐으면 좋겠지만 일단.. )

180831 / 1차 목표 deadline

180xxx / 예외처리 ( ASSERT는 경고만 하는지 중단도 하는지, 경고만 하는 경우, 중단하는 경우, 등등 ) 180xxx / 기능 테스트 (MATLAB 값과 비교. 값의 일치 기준 정하기) 2 days / 함수 형태, 이름 규칙 정하기 -> 이름통일 -> 문서화, README.md 최신화

  • return 값 하나도 없도록 만드는게 좋겠음 <- darknet 에서 그렇게 해서...

7 days / 기능 구현( evd, svd, inverse, diagonal, trace, determinant ) 현재 / 최적화 하지 않은 native C로 모든 함수를 구현(진짜 구현만)

gogyzzz commented 6 years ago

다음을 추가했습니다. google fast approx fftw

kooBH commented 6 years ago

프로젝트에서 다루는 이슈이므로 닫겠습니다