flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

Installation Failed: flashlight-{cuda/cpu}[asr] #950

Closed rainbowrun closed 3 years ago

rainbowrun commented 3 years ago

I am installing on the brand-new google compute engine with Ubuntu 2004 and and Tesla V100. All my previous installation steps are successfully (MKL, Cuda - 11.2, Cudnn 8.1, nccl-2.8.4) and vcpkg. I actually have successfully finished the step:

$ ./vcpkg install flashlight-cuda

But when I ran the next step

$ ./vcpkg install flashlight-cuda[asr]

The 'vcpkg' always prompts me that I need to pass the '--recursive' flag to have the [core,lib,asr] all built, and when I actually did that, the build failed with an error as following:

[186/193] : && /usr/bin/c++ -fPIC -g -rdynamic CMakeFiles/fl_asr_tutorial_inference_ctc.dir/flashlight/app/asr/tutorial/InferenceCTC.cpp.o -o bin/asr/fl_asr_tutorial_inference_ctc -L/usr/local/cuda-11.2/targets/x86_64-linux/lib/stubs -L/usr/local/cuda-11.2/targets/x86_64-linux/lib -Wl,-rpath,/home/xiaopanzhang/vcpkg/packages/arrayfire_x64-linux/lib:/home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib:/home/xiaopanzhang/vcpkg/installed/x64-linux/lib libflashlight-app-asr.a -ldl /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libglog.a /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libgflags.a libflashlight.a libfl-libraries.a /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_gnu_thread.so /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_core.so -fopenmp /usr/lib/x86_64-linux-gnu/libpthread.so -lm /usr/lib/x86_64-linux-gnu/libdl.so /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libfftw3.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libkenlm.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libkenlm_util.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/liblzmad.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libbz2d.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libz.a /home/xiaopanzhang/vcpkg/packages/arrayfire_x64-linux/lib/libafcuda.so.3.7.3 -pthread /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libcudnn.so /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libmpi.so /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libnccl.so /usr/local/cuda-11.2/lib64/libcudart_static.a -ldl -lpthread /usr/lib/x86_64-linux-gnu/librt.so /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libsndfile.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libvorbisenc.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libvorbis.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libFLAC.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libogg.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libopus.a -lm -lcudadevrt -lcudart_static -lrt -lpthread -ldl && : FAILED: bin/asr/fl_asr_tutorial_inference_ctc : && /usr/bin/c++ -fPIC -g -rdynamic CMakeFiles/fl_asr_tutorial_inference_ctc.dir/flashlight/app/asr/tutorial/InferenceCTC.cpp.o -o bin/asr/fl_asr_tutorial_inference_ctc -L/usr/local/cuda-11.2/targets/x86_64-linux/lib/stubs -L/usr/local/cuda-11.2/targets/x86_64-linux/lib -Wl,-rpath,/home/xiaopanzhang/vcpkg/packages/arrayfire_x64-linux/lib:/home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib:/home/xiaopanzhang/vcpkg/installed/x64-linux/lib libflashlight-app-asr.a -ldl /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libglog.a /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libgflags.a libflashlight.a libfl-libraries.a /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_gnu_thread.so /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_core.so -fopenmp /usr/lib/x86_64-linux-gnu/libpthread.so -lm /usr/lib/x86_64-linux-gnu/libdl.so /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libfftw3.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libkenlm.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libkenlm_util.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/liblzmad.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libbz2d.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libz.a /home/xiaopanzhang/vcpkg/packages/arrayfire_x64-linux/lib/libafcuda.so.3.7.3 -pthread /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libcudnn.so /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libmpi.so /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libnccl.so /usr/local/cuda-11.2/lib64/libcudart_static.a -ldl -lpthread /usr/lib/x86_64-linux-gnu/librt.so /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libsndfile.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libvorbisenc.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libvorbis.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libFLAC.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libogg.a /home/xiaopanzhang/vcpkg/installed/x64-linux/debug/lib/libopus.a -lm -lcudadevrt -lcudart_static -lrt -lpthread -ldl && : /usr/bin/ld: warning: libcuda.so.1, needed by /home/xiaopanzhang/vcpkg/packages/arrayfire_x64-linux/lib/libafcuda.so.3.7.3, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libcudart.so.10.1, needed by /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libnccl.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libnccl.so: undefined reference to cudaGetDeviceCount@libcudart.so.10.1' /usr/bin/ld: /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libnccl.so: undefined reference to__cudaRegisterFunction@libcudart.so.10.1' /usr/bin/ld: /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libnccl.so: undefined reference to `cudaIpcCloseMemHandle@libcudart.so.10.1'

As far as I can see, the link failed because it looks like the linker is looking for dynamic libraries which is different from the version I installed, but I do have LD_LIBRARY_PATH and PATH point to the cuda installation location (/usr/local/cuda-11.2/... here), I also check the library list of 'vcpkg' and it says:

xiaopanzhang@wav2letter:~/vcpkg$ ./vcpkg list | grep cuda arrayfire[cuda]:x64-linux ArrayFire CUDA backend cuda:x64-linux 10.1#5 A parallel computing platform and programming model

xiaopanzhang@wav2letter:~/vcpkg$ ./vcpkg list | grep nccl nccl:x64-linux 2.4.6 Optimized primitives for collective multi-GPU co..

It looks like the vcpkg also has its own old version of cuda, cudnn and nccl, could anybody tell me what might go wrong here?

tlikhomanenko commented 3 years ago

Could you recreate this issue directly in flashlight github?

Also cc @jacobkahn

rainbowrun commented 3 years ago

I I have not tried that yet, but today I picked a new Google GCE and re-run the step (a.k.a. installing by using vcpkg), it failed with the same error.

I noticed the cmake file for cuda in vcpkg (https://github.com/microsoft/vcpkg/blob/master/ports/cuda/vcpkg_find_cuda.cmake) has a hard-coded cuda version as the following:

...

set(CUDA_PATHS 
        ENV CUDA_PATH
        ENV CUDA_HOME
        ENV CUDA_BIN_PATH
        ENV CUDA_PATH_V11_0
        ENV CUDA_PATH_V10_2
        ENV CUDA_PATH_V10_1)

notice that 11_2 is not there ...

Not sure if external version of cuda (installed through apt) other than those above listed were ever tested with vcpkg.

Anyway, after the second failure, I switched to the docker file of wav2letter and so far is it goes well.

tlikhomanenko commented 3 years ago

Yep, we didn't test 11.1 and 11.2 as we don't have them for now on our machines. Only tested build from source for cuda 11.1 in the CI, that all tests pass. Could you use lower versions of cuda then?

rainbowrun commented 3 years ago

Your message reads confusing to me, have you tested 11.1 or not? What is a CI?

Which version of cuda do you suggest me to lower to?

tlikhomanenko commented 3 years ago

We didn't test locally 11.1 and 11.2 as we don't have machine with them.

We have tested 11.1 in CI - continuous integration, where build and tests on every commit are running https://app.circleci.com/pipelines/github/facebookresearch/flashlight, and the config for CI is here https://github.com/facebookresearch/flashlight/blob/master/.circleci/config.yml#L11 where you can see cuda 11.1.

Could you try 11.0 for which vcpkg was uploaded?

MHumza3656 commented 3 years ago

@tlikhomanenko it seems that the file has been updated again and now strictly restricted to 10.1 CUDA version? Does this mean I need CUDA 10.1 to build flashlight[asr]?

    set(CUDA_REQUIRED_VERSION "10.1.0")
    set(CUDA_PATHS
            ENV CUDA_PATH
            ENV CUDA_HOME
            ENV CUDA_BIN_PATH
            ENV CUDA_TOOLKIT_ROOT_DIR)

Note: I also tried to run it on CUDA 11.0 following command runs successfully. ./vcpkg install flashlight-cuda[asr] but later it fails, explained below It is later when I download the flashlight module and build that application, the steps mentioned in Integrating flashlight into your own project. I get this error where it says that /usr/bin/ld: warning: libcudart.so.10.1, needed by /home/xiaopanzhang/vcpkg/installed/x64-linux/lib/libnccl.so, not found (try using -rpath or -rpath-link) and for multiple files it gives unreferenced error

jacobkahn commented 3 years ago

@MHumza3656 — this was a bug with vcpkg which has been fixed for some CUDA libs but not others. There are a few fixes in the work to fix this for NCCL.

You can get around this by downloading and installing cuDNN and NCCL yourself (and not relying on the vcpkg downloads, since that will download versions of these libraries that expect CUDA 10.1 and will break if you don't have them. Make sure you (1) uninstall the vcpkg cudnn and nccl packages before doing this (2) download and install cuDNN and NCCL versions that are compatible with your CUDA version. After you've done that, try things again.

As for a fix in vcpkg -- a fix for cuDNN has been landed, but not yet for NCCL. I will try to get to this as soon as I can - maybe in the next few days.

MHumza3656 commented 3 years ago

right I understand, I'm trying it again after installing CUDA 11.0 and cuDNN (8.1.1) and NCCL (2.8.4) that supports it. Thank you! please notify me here once fixed.

MHumza3656 commented 3 years ago

Its giving me this weird bug now of arrayfire

-- Downloading https://github.com/arrayfire/forge/archive/1a0f0cb6371a8c8053ab5eb7cbe3039c95132389.tar.gz -> arrayfire-forge-1a0f0cb6371a8c8053ab5eb7cbe3039c95132389.tar.gz...
-- Extracting source /home/humza/vcpkg/downloads/arrayfire-forge-1a0f0cb6371a8c8053ab5eb7cbe3039c95132389.tar.gz
-- Using source at /home/humza/vcpkg/buildtrees/arrayfire/src/9c95132389-511398ace8.clean
-- Configuring x64-linux-dbg
-- Configuring x64-linux-rel
-- Building x64-linux-dbg
CMake Error at scripts/cmake/vcpkg_execute_build_process.cmake:146 (message):
    Command failed: /home/humza/vcpkg/downloads/tools/cmake-3.19.2-linux/cmake-3.19.2-Linux-x86_64/bin/cmake --build . --config Debug --target install -- -v -j9
    Working Directory: /home/humza/vcpkg/buildtrees/arrayfire/x64-linux-dbg
    See logs for more information:
      /home/humza/vcpkg/buildtrees/arrayfire/install-x64-linux-dbg-out.log

Call Stack (most recent call first):
  scripts/cmake/vcpkg_build_cmake.cmake:105 (vcpkg_execute_build_process)
  scripts/cmake/vcpkg_install_cmake.cmake:45 (vcpkg_build_cmake)
  ports/arrayfire/portfile.cmake:76 (vcpkg_install_cmake)
  scripts/ports.cmake:142 (include)

Error: Building package arrayfire:x64-linux failed with: BUILD_FAILED
Please ensure you're using the latest portfiles with `./vcpkg update`, then
submit an issue at https://github.com/Microsoft/vcpkg/issues including:
  Package: arrayfire:x64-linux
  Vcpkg version: 2021-01-13-unknownhash

Additionally, attach any relevant sections from the log files above.

At one PC its giving me this on one machine and using 8 CPU cores in parallel for the last 30 mins in another. Both of these PCs had same installation steps performed for CUDA 11.0 and cnDNN 8.1.1 and NCCL 2.8.4 but in first machine, while installing cuda (according to NVIDIA lines) it also installed 11.3, in second it didn't

tlikhomanenko commented 3 years ago

If you have any problems with other packages/deps installation in vspkg please report bug directly on this package github as it is said in the error stack.

MHumza3656 commented 3 years ago

oh right, well whatever it was. I encountered one machine not the other one