baidu-research / warp-ctc

Fast parallel CTC.
Apache License 2.0
4.06k stars 1.04k forks source link

Compile failure due to undeclared gpuStream_t #170

Closed matwhite closed 3 years ago

matwhite commented 3 years ago

The prior PR seems to have missed a declaration. In compiling, I see the following error:

In file included from /tmp/luarocks_warp-ctc-scm-2-3241/warp-ctc/torch_binding/binding.cpp:18:0:
/tmp/luarocks_warp-ctc-scm-1-3241/warp-ctc/include/detail/reduce.h:4:85: error: 'gpuStream_t' has not been declared
 ctcStatus_t reduce_negate(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
                                                                                     ^
/tmp/luarocks_warp-ctc-scm-1-3241/warp-ctc/include/detail/reduce.h:6:82: error: 'gpuStream_t' has not been declared
 ctcStatus_t reduce_exp(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
                                                                                  ^
/tmp/luarocks_warp-ctc-scm-1-3241/warp-ctc/include/detail/reduce.h:8:82: error: 'gpuStream_t' has not been declared
 ctcStatus_t reduce_max(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
                                                                                  ^
make[2]: *** [CMakeFiles/warp_ctc.dir/torch_binding/binding.cpp.o] Error 1
make[1]: *** [CMakeFiles/warp_ctc.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [all] Error 2

In reverting back to cd828e5b6c3b953b82af73f7f44cddc393a20efa, I am able to successfully build it.

windstamp commented 3 years ago

Sorry for that, would you try to add a new line #include "type_defs.h" to warp-ctc/include/detail/reduce.h, and make again.

#pragma once

template <typename T>
ctcStatus_t reduce_negate(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
template <typename T>
ctcStatus_t reduce_exp(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
template <typename T>
ctcStatus_t reduce_max(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);

changes to

#pragma once

#include "type_defs.h"

template <typename T>
ctcStatus_t reduce_negate(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
template <typename T>
ctcStatus_t reduce_exp(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
template <typename T>
ctcStatus_t reduce_max(const T* input, T* output, int rows, int cols, bool axis, gpuStream_t stream);
matwhite commented 3 years ago

I tried that, but it seems to have no effect. In an attempt to get it working I added #define gpuStream_t cudaStream_t to reduce.h and that worked, but that's not a real solution since it's a kludge, and it doesn't handle hip.

windstamp commented 3 years ago

Thanks for your try.

It seems using gpuStream_t = cudaStream_t; does not work, and #define gpuStream_t cudaStream_t does work?

So would you try add using gpuStream_t = cudaStream_t; (not #define gpuStream_t cudaStream_t) to reduce.h, and it will failed as expected?

Or try changes type_defs.h

#pragma once

#if (defined(__HIPCC__) || defined(__CUDACC__))

#ifdef __HIPCC__
#include <hip/hip_runtime.h>
#else
#include <cuda_runtime.h>
#endif

#ifdef __HIPCC__
#define gpuSuccess hipSuccess
using gpuStream_t = hipStream_t;
using gpuError_t = hipError_t;
using gpuEvent_t = hipEvent_t;
#else
#define gpuSuccess cudaSuccess
using gpuStream_t = cudaStream_t;
using gpuError_t = cudaError_t;
using gpuEvent_t = cudaEvent_t;
#endif

#endif

to

#pragma once

#if (defined(__HIPCC__) || defined(__CUDACC__))

#ifdef __HIPCC__
#include <hip/hip_runtime.h>
#else
#include <cuda_runtime.h>
#endif

#ifdef __HIPCC__
#define gpuSuccess hipSuccess
using gpuStream_t = hipStream_t;
using gpuError_t = hipError_t;
using gpuEvent_t = hipEvent_t;
#else
#define gpuSuccess cudaSuccess
#define gpuStream_t cudaStream_t
// using gpuStream_t = cudaStream_t;
using gpuError_t = cudaError_t;
using gpuEvent_t = cudaEvent_t;
#endif

#endif

and it will success as expected?

And by the way, are you use cuda right now for this problem.

Sorry to ask you help me does this, due to I can not install torch successfully right now, so I can not build with torch.

matwhite commented 3 years ago

The suggested changes did not work for me. I have run out of time to look at this issue, so I will close it out. Thank you for helping me take a look.