Closed mratsim closed 4 years ago
Hi, thanks for the question. What version of DALI are you trying to build? We test every master commit, so you should be able to build it. Are you running a clean build?
It should have been master from yesterday so commit 801c888. From the git blame https://github.com/mratsim/Arch-Data-Science/blame/a723bdc99835f109c146b26586a4ca166ef9ab25/training/dali/PKGBUILD#L5 it would be an update from 535182b8. Unfortunately I can only retry and confirm on Friday at the earliest as I'm working from abroad away from my workstation this week.
This is my configure output for the latest e079bcff commit:
-- DALI version: 0.15.0dev
-- DALI_extra version: 1b224243c057d413cf3e1c75694d4d1acf73d0dc
-- Build configuration: Release
/opt/cuda
nvJPEG found in /opt/cuda/include
nvJPEG is using new API
-- Found OpenCV: /usr/include/opencv4 (found suitable version "4.1.2", minimum required is "3.0")
OpenCV libraries: opencv_core;opencv_imgproc;opencv_imgcodecs
-- LLVM FileCheck Found: /usr/bin/FileCheck
-- git Version: v1.4.0-505be96a
-- Version: 1.4.0
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK -- success
Using libjpeg-turbo at /usr/lib/libjpeg.so
-- Found TIFF: /usr/lib/libtiff.so (found version "4.0.10")
Using libtiff at /usr/lib/libtiff.so
-- pybind11 v2.2.4
-- Building WITHOUT LMDB support
-- Enabling TensorFlow TFRecord file format support
-- CUDA supported archs: 35;50;52;60;61;70;75
-- CUDA targeted archs: 35;50;52;60;61;70;75
-- Generated gencode flags: -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75
-- Exclude libs 'libcudart_static.a:libnvjpeg_static.a:libnppicom_static.a:libnppicc_static.a:libnppig_static.a:libnppc_static.a:libculibos.a:libopencv_core.a:libopencv_imgproc.a:libopencv_highgui.a:libopencv_imgcodecs.a:liblibwebp.a:libittnotify.a:libpng.a:liblibtiff.a:liblibjasper.a:libIlmImf.a:liblibjpeg-turbo.a:libprotobuf.a:libsupc++.a:libstdc++.a:libstdc++_nonshared.a'
-- Adding dependencies to dali: '/opt/cuda/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/librt.so;/opt/cuda/lib/libnvjpeg_static.a;/opt/cuda/lib64/libnppicom_static.a;/opt/cuda/lib64/libnppicc_static.a;/opt/cuda/lib64/libnppig_static.a;/opt/cuda/lib64/libnppc_static.a;/opt/cuda/lib64/libculibos.a;opencv_core;opencv_imgproc;opencv_imgcodecs;/usr/lib/libjpeg.so;/usr/lib/libtiff.so;avformat;avformat;avcodec;avfilter;avutil;/usr/lib/libprotobuf.so'
-- Adding dependencies to dali_test.bin: 'dali'
-- Adding dependencies to dali_benchmark.bin: 'dali'
-- Adding dependencies to backend_impl: 'dali'
-- Configuring done
-- Generating done
-- Build files have been written to: /pkg/makepkg/buildpkg/dali-git/src/DALI/build
Scanning dependencies of target dynlink_cuda
Scanning dependencies of target gtest
Scanning dependencies of target CAFFE2_PROTO
Scanning dependencies of target benchmark
[ 0%] Building NVCC (Device) object dali/kernels/CMakeFiles/dali_kernels.dir/imgproc/resample/dali_kernels_generated_resampling_filters.cu.o
Scanning dependencies of target TF_PROTO
[ 0%] Building NVCC (Device) object dali/kernels/CMakeFiles/dali_kernels.dir/imgproc/resample/dali_kernels_generated_resampling_batch.cu.o
Scanning dependencies of target DALI_PROTO
Scanning dependencies of target CAFFE_PROTO
[ 0%] Building NVCC (Device) object dali/kernels/CMakeFiles/dali_kernels.dir/common/dali_kernels_generated_scatter_gather.cu.o
[ 0%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/benchmark_register.cc.o
[ 0%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/commandlineflags.cc.o
[ 1%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/colorprint.cc.o
...
For now I've fixed by pinning to 0.13 instead of building v0.15 alpha, see changes (ignore the version it's updated afterwards): https://github.com/mratsim/Arch-Data-Science/commit/bbc51b057551b99056e987d6ec423ef31f91f49e.
A bisect should be able to pinpoint the regression rapidly as there is only a month of difference between 0.13 and current master.
We have found a problem. It seems that GCC error message is a bit misleading. Fix in https://github.com/NVIDIA/DALI/pull/1320. Please try it.
Tested, I confirm the build works with the following script: https://github.com/mratsim/Arch-Data-Science/blob/8210d2a186b3364b32f355a2d9eca54f61f31e20/training/dali-git/PKGBUILD
Thank you!
I have an issue when building DALI from source
dali/pipeline/operators/fused/crop_mirror_normalize.h:215:44: error: ‘class dali::OperatorBase’ has no member named ‘GetArgument’
Build script is there: https://github.com/mratsim/Arch-Data-Science/blob/a723bdc99835f109c146b26586a4ca166ef9ab25/training/dali/PKGBUILD#L24-L31
and used to work in the past (see my other issues in this repo)