NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.01k stars 610 forks source link

Building from source: ‘class dali::OperatorBase’ has no member named ‘GetArgument’ #1257

Closed mratsim closed 4 years ago

mratsim commented 4 years ago

I have an issue when building DALI from source

dali/pipeline/operators/fused/crop_mirror_normalize.h:215:44: error: ‘class dali::OperatorBase’ has no member named ‘GetArgument’

[ 42%] Building CXX object dali/CMakeFiles/dali_operators.dir/pipeline/operators/displacement/new_warp_affine.cc.o
[ 43%] Building CXX object dali/CMakeFiles/dali_operators.dir/pipeline/operators/support/random/coin_flip.cc.o
In file included from /pkg/makepkg/buildpkg/dali-git/src/DALI/third_party/boost/preprocessor/include/boost/preprocessor/punctuation/remove_parens.hpp:20,
                 from /pkg/makepkg/buildpkg/dali-git/src/DALI/include/dali/core/static_switch.h:54,
                 from /pkg/makepkg/buildpkg/dali-git/src/DALI/dali/pipeline/operators/fused/crop_mirror_normalize.h:25,
                 from /pkg/makepkg/buildpkg/dali-git/src/DALI/dali/pipeline/operators/fused/crop_mirror_normalize.cc:15:
/pkg/makepkg/buildpkg/dali-git/src/DALI/dali/pipeline/operators/fused/crop_mirror_normalize.h: In member function ‘void dali::CropMirrorNormalize<Backend>::SetupAndInitialize(dali::workspace_t<Backend>&)’:
/pkg/makepkg/buildpkg/dali-git/src/DALI/dali/pipeline/operators/fused/crop_mirror_normalize.h:215:44: error: ‘class dali::OperatorBase’ has no member named ‘GetArgument’
         mirror_[data_idx] = spec_.template GetArgument<int>("mirror", &ws, data_idx);

Build script is there: https://github.com/mratsim/Arch-Data-Science/blob/a723bdc99835f109c146b26586a4ca166ef9ab25/training/dali/PKGBUILD#L24-L31

  export CC=gcc-8
  export CXX=g++-8

  mkdir -p  "${_name}"/build
  cd "${_name}"/build
  cmake .. -DCMAKE_INSTALL_PREFIX="${pkgdir}"/usr -DPROTOBUF_LIBRARY=/usr/lib/libprotobuf.so
  make

and used to work in the past (see my other issues in this repo)

awolant commented 4 years ago

Hi, thanks for the question. What version of DALI are you trying to build? We test every master commit, so you should be able to build it. Are you running a clean build?

mratsim commented 4 years ago

It should have been master from yesterday so commit 801c888. From the git blame https://github.com/mratsim/Arch-Data-Science/blame/a723bdc99835f109c146b26586a4ca166ef9ab25/training/dali/PKGBUILD#L5 it would be an update from 535182b8. Unfortunately I can only retry and confirm on Friday at the earliest as I'm working from abroad away from my workstation this week.

mratsim commented 4 years ago

This is my configure output for the latest e079bcff commit:

-- DALI version: 0.15.0dev
-- DALI_extra version: 1b224243c057d413cf3e1c75694d4d1acf73d0dc
-- Build configuration: Release
/opt/cuda
nvJPEG found in /opt/cuda/include
nvJPEG is using new API
-- Found OpenCV: /usr/include/opencv4 (found suitable version "4.1.2", minimum required is "3.0")
OpenCV libraries: opencv_core;opencv_imgproc;opencv_imgcodecs
-- LLVM FileCheck Found: /usr/bin/FileCheck
-- git Version: v1.4.0-505be96a
-- Version: 1.4.0
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK -- success
Using libjpeg-turbo at /usr/lib/libjpeg.so
-- Found TIFF: /usr/lib/libtiff.so (found version "4.0.10") 
Using libtiff at /usr/lib/libtiff.so
-- pybind11 v2.2.4
-- Building WITHOUT LMDB support
-- Enabling TensorFlow TFRecord file format support
-- CUDA supported archs: 35;50;52;60;61;70;75
-- CUDA targeted archs: 35;50;52;60;61;70;75
-- Generated gencode flags:  -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75
-- Exclude libs 'libcudart_static.a:libnvjpeg_static.a:libnppicom_static.a:libnppicc_static.a:libnppig_static.a:libnppc_static.a:libculibos.a:libopencv_core.a:libopencv_imgproc.a:libopencv_highgui.a:libopencv_imgcodecs.a:liblibwebp.a:libittnotify.a:libpng.a:liblibtiff.a:liblibjasper.a:libIlmImf.a:liblibjpeg-turbo.a:libprotobuf.a:libsupc++.a:libstdc++.a:libstdc++_nonshared.a'
-- Adding dependencies to dali: '/opt/cuda/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/librt.so;/opt/cuda/lib/libnvjpeg_static.a;/opt/cuda/lib64/libnppicom_static.a;/opt/cuda/lib64/libnppicc_static.a;/opt/cuda/lib64/libnppig_static.a;/opt/cuda/lib64/libnppc_static.a;/opt/cuda/lib64/libculibos.a;opencv_core;opencv_imgproc;opencv_imgcodecs;/usr/lib/libjpeg.so;/usr/lib/libtiff.so;avformat;avformat;avcodec;avfilter;avutil;/usr/lib/libprotobuf.so'
-- Adding dependencies to dali_test.bin: 'dali'
-- Adding dependencies to dali_benchmark.bin: 'dali'
-- Adding dependencies to backend_impl: 'dali'
-- Configuring done
-- Generating done
-- Build files have been written to: /pkg/makepkg/buildpkg/dali-git/src/DALI/build
Scanning dependencies of target dynlink_cuda
Scanning dependencies of target gtest
Scanning dependencies of target CAFFE2_PROTO
Scanning dependencies of target benchmark
[  0%] Building NVCC (Device) object dali/kernels/CMakeFiles/dali_kernels.dir/imgproc/resample/dali_kernels_generated_resampling_filters.cu.o
Scanning dependencies of target TF_PROTO
[  0%] Building NVCC (Device) object dali/kernels/CMakeFiles/dali_kernels.dir/imgproc/resample/dali_kernels_generated_resampling_batch.cu.o
Scanning dependencies of target DALI_PROTO
Scanning dependencies of target CAFFE_PROTO
[  0%] Building NVCC (Device) object dali/kernels/CMakeFiles/dali_kernels.dir/common/dali_kernels_generated_scatter_gather.cu.o
[  0%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/benchmark_register.cc.o
[  0%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/commandlineflags.cc.o
[  1%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/colorprint.cc.o
...
mratsim commented 4 years ago

For now I've fixed by pinning to 0.13 instead of building v0.15 alpha, see changes (ignore the version it's updated afterwards): https://github.com/mratsim/Arch-Data-Science/commit/bbc51b057551b99056e987d6ec423ef31f91f49e.

A bisect should be able to pinpoint the regression rapidly as there is only a month of difference between 0.13 and current master.

JanuszL commented 4 years ago

We have found a problem. It seems that GCC error message is a bit misleading. Fix in https://github.com/NVIDIA/DALI/pull/1320. Please try it.

mratsim commented 4 years ago

Tested, I confirm the build works with the following script: https://github.com/mratsim/Arch-Data-Science/blob/8210d2a186b3364b32f355a2d9eca54f61f31e20/training/dali-git/PKGBUILD

Thank you!