What is the requirement of lightseq3.0

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Other

3.22k stars 329 forks source link

What is the requirement of lightseq3.0 #402

Open Fizzmy opened 2 years ago

Fizzmy commented 2 years ago

I try to build from source code though document docs/inference/build.md . And I realized that this document is out of date.( Because only cuda11 supports the header #include <cooperative_groups/reduce.h>) Then I tried to run lightseq3.0 in cuda11. But I encountered another error: no instance of overloaded function "atomicAdd" matches the argument list argument types are: (__half *, float)

So I wonder that if this error is about CUDA_ARCH? How can I run this code in a less powerful GPU?

hexisyztem commented 2 years ago

What is your GPU device model and what is the cuda version? If you compile on your own device, you can take a try to modify here(https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L88) to "set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86)".

hexisyztem commented 2 years ago

Because CMAKE_CUDA_ARCHITECTURES 60/61 don't support atomicAdd(__half*, float)

Fizzmy commented 2 years ago

What is your GPU device model and what is the cuda version? If you compile on your own device, you can take a try to modify here(https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L88) to "set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86)".

My GPU is Tesla P100 and cuda version is 11.1. I found that my CUDA_ARCH is 60. How could I complie it without FP16?

hexisyztem commented 2 years ago

What is your GPU device model and what is the cuda version? If you compile on your own device, you can take a try to modify here(https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L88) to "set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86)".

My GPU is Tesla P100 and cuda version is 11.1. I found that my CUDA_ARCH is 60. How could I complie it without FP16?

https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L7

hexisyztem commented 2 years ago

What is your GPU device model and what is the cuda version? If you compile on your own device, you can take a try to modify here(https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L88) to "set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86)".

My GPU is Tesla P100 and cuda version is 11.1. I found that my CUDA_ARCH is 60. How could I complie it without FP16?

https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L7

-DFP16_MODE=OFF

Fizzmy commented 2 years ago

Thank you, I complie successfully. What I do is: 1.modified (https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L88) to "set(CMAKE_CUDA_ARCHITECTURES 60 61 70 75)". 2.modified /usr/local/cuda/include/thrust/system/cuda/config.h #ifndef THRUST_IGNORE_CUB_VERSION_CHECK to #ifdef THRUST_IGNORE_CUB_VERSION_CHECK 3.compile the code with command cmake -DCMAKE_BUILD_TYPE=Release -DFP16_MODE=OFF -DDYNAMIC_API=ON .. && make -j

Fizzmy commented 2 years ago

If I want to run py test code( such as test_ls_layers_new.py), what should I do after building all targets?

hexisyztem commented 2 years ago

If you want to run the test code, you can directly python3 test/xxx.py

Fizzmy commented 2 years ago

If you want to run the test code, you can directly python3 test/xxx.py

Then I met the first error 😂