Enable CUDA (take two) - Githubissues

traversaro commented 1 year ago

Updated version of https://github.com/conda-forge/onnxruntime-feedstock/pull/7 . Fix https://github.com/conda-forge/onnxruntime-feedstock/pull/7 .

Checklist

[x] Used a personal fork of the feedstock to propose changes
[x] Bumped the build number (if the version is unchanged)
[ ] Reset the build number to 0 (if the version changed)
[x] Re-rendered with the latest conda-smithy (Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
[x] Ensured the license file is being packaged.

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

traversaro commented 1 year ago

@conda-forge-admin, please rerender

traversaro commented 1 year ago

@conda-forge-admin, please rerender

traversaro commented 1 year ago

11.1 and 11.0 builds on Linux are failing with:

[69/1593] Building CUDA object CMakeFiles/onnxruntime_test_cuda_ops_lib.dir$SRC_DIR/onnxruntime/test/shared_lib/cuda_ops.cu.o
FAILED: CMakeFiles/onnxruntime_test_cuda_ops_lib.dir$SRC_DIR/onnxruntime/test/shared_lib/cuda_ops.cu.o 
$BUILD_PREFIX/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DNSYNC_ATOMIC_CPP11 -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DUSE_CUDA=1 -D_GNU_SOURCE -I$SRC_DIR/include/onnxruntime -I$SRC_DIR/include/onnxruntime/core/session -I$SRC_DIR/build-ci/Release/_deps/pytorch_cpuinfo-src/include -I$SRC_DIR/build-ci/Release/_deps/google_nsync-src/public -I$SRC_DIR/build-ci/Release -I$SRC_DIR/onnxruntime -I$SRC_DIR/build-ci/Release/_deps/abseil_cpp-src -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG -std=c++17 -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -MD -MT CMakeFiles/onnxruntime_test_cuda_ops_lib.dir$SRC_DIR/onnxruntime/test/shared_lib/cuda_ops.cu.o -MF CMakeFiles/onnxruntime_test_cuda_ops_lib.dir$SRC_DIR/onnxruntime/test/shared_lib/cuda_ops.cu.o.d -x cu -c $SRC_DIR/onnxruntime/test/shared_lib/cuda_ops.cu -o CMakeFiles/onnxruntime_test_cuda_ops_lib.dir$SRC_DIR/onnxruntime/test/shared_lib/cuda_ops.cu.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
[70/1593] Building CXX object CMakeFiles/onnxruntime_providers_shared.dir$SRC_DIR/onnxruntime/core/providers/shared/common.cc.o
[71/1593] Building CXX object CMakeFiles/onnxruntime_mlas.dir$SRC_DIR/onnxruntime/core/mlas/lib/intrinsics/avx512/quantize_avx512f.cpp.o
ninja: build stopped: subcommand failed.

Based on https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#build and https://github.com/microsoft/onnxruntime/issues/14644, I guess this version of CUDA are simply not supported, so we can just drop them given that this is a new package.

traversaro commented 1 year ago

@conda-forge-admin, please rerender

traversaro commented 1 year ago

Both Windows and Linux CUDA builds are failing with error:

2023-06-24T13:40:44.7736050Z C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/include\thrust/system/cuda/config.h(78): fatal error C1189: #error:  The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.

I do not have experience with cub and thurst. I see that the corresponding feedstocks are archived, perhaps somebody from @conda-forge/cub or @conda-forge/thrust have any insight on how to proceed? Thanks in advance!

traversaro commented 1 year ago

Both Windows and Linux CUDA builds are failing with error:
2023-06-24T13:40:44.7736050Z C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/include\thrust/system/cuda/config.h(78): fatal error C1189: #error:  The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
I do not have experience with cub and thurst. I see that the corresponding feedstocks are archived, perhaps somebody from @conda-forge/cub or @conda-forge/thrust have any insight on how to proceed? Thanks in advance!

Apparently I just forgot a cub dependency in the meta.yaml, I just deleted it, sorry for the noise.

traversaro commented 1 year ago

Build is now failing with:

2023-06-27T08:18:12.8964617Z [775/1593] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc.o
2023-06-27T08:18:12.8971202Z FAILED: CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc.o 
2023-06-27T08:18:12.8981167Z $BUILD_PREFIX/bin/x86_64-conda-linux-gnu-c++ -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DONNX_USE_LITE_PROTO=1 -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DUSE_CUDA=1 -D_GNU_SOURCE -D__ONNX_NO_DOC_STRINGS -Donnxruntime_providers_cuda_EXPORTS -I$SRC_DIR/include/onnxruntime -I$SRC_DIR/include/onnxruntime/core/session -I$SRC_DIR/build-ci/Release/_deps/pytorch_cpuinfo-src/include -I$SRC_DIR/build-ci/Release/_deps/google_nsync-src/public -I$SRC_DIR/build-ci/Release -I$SRC_DIR/onnxruntime -I$SRC_DIR/build-ci/Release/_deps/abseil_cpp-src -I$SRC_DIR/build-ci/Release/_deps/safeint-src -I$SRC_DIR/build-ci/Release/_deps/gsl-src/include -I$SRC_DIR/build-ci/Release/_deps/onnx-src -I$SRC_DIR/build-ci/Release/_deps/onnx-build -I$SRC_DIR/build-ci/Release/_deps/protobuf-src/src -I$SRC_DIR/build-ci/Release/_deps/flatbuffers-src/include -I$SRC_DIR/build-ci/Release/_deps/eigen-src -I$SRC_DIR/build-ci/Release/_deps/mp11-src/include -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/onnxruntime-1.15.1 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -isystem /usr/local/cuda/include -ffunction-sections -fdata-sections -DCPUINFO_SUPPORTED -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -fPIC -Wall -Wextra -Wno-deprecated-copy -Wno-nonnull-compare -Wno-reorder -Wno-error=sign-compare -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc.o -MF CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc.o.d -o CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc.o -c $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc
2023-06-27T08:18:12.8988968Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc: In member function 'onnxruntime::common::Status onnxruntime::contrib::cuda::Attention<T>::ComputeInternal(onnxruntime::OpKernelContext*) const':
2023-06-27T08:18:12.8995186Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc:167:3: error: there are no arguments to 'ORT_UNUSED_VARIABLE' that depend on a template parameter, so a declaration of 'ORT_UNUSED_VARIABLE' must be available [-fpermissive]
2023-06-27T08:18:12.9000825Z   167 |   ORT_UNUSED_VARIABLE(is_mask_1d_key_seq_len_start);
2023-06-27T08:18:12.9011123Z       |   ^~~~~~~~~~~~~~~~~~~
2023-06-27T08:18:12.9011929Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc:167:3: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
2023-06-27T08:18:12.9012837Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc: In instantiation of 'onnxruntime::common::Status onnxruntime::contrib::cuda::Attention<T>::ComputeInternal(onnxruntime::OpKernelContext*) const [with T = onnxruntime::MLFloat16]':
2023-06-27T08:18:12.9013415Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.h:21:10:   required from here
2023-06-27T08:18:12.9014039Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc:167:22: error: 'ORT_UNUSED_VARIABLE' was not declared in this scope; did you mean 'HAS_UNUSED_VARIABLE'?
2023-06-27T08:18:12.9014766Z   167 |   ORT_UNUSED_VARIABLE(is_mask_1d_key_seq_len_start);
2023-06-27T08:18:12.9015056Z       |   ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-06-27T08:18:12.9015289Z       |   HAS_UNUSED_VARIABLE
2023-06-27T08:18:12.9015946Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc: In instantiation of 'onnxruntime::common::Status onnxruntime::contrib::cuda::Attention<T>::ComputeInternal(onnxruntime::OpKernelContext*) const [with T = float]':
2023-06-27T08:18:12.9016440Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.h:21:10:   required from here
2023-06-27T08:18:12.9017000Z $SRC_DIR/onnxruntime/contrib_ops/cuda/bert/attention.cc:167:22: error: 'ORT_UNUSED_VARIABLE' was not declared in this scope; did you mean 'HAS_UNUSED_VARIABLE'?
2023-06-27T08:18:12.9017380Z   167 |   ORT_UNUSED_VARIABLE(is_mask_1d_key_seq_len_start);
2023-06-27T08:18:12.9017662Z       |   ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-06-27T08:18:12.9017904Z       |   HAS_UNUSED_VARIABLE

See https://github.com/microsoft/onnxruntime/issues/16000#issuecomment-1562265152, it should be easy to patch.

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

traversaro commented 1 year ago

The build is successful. Test are failing as expected as there is no GPU in the test machines.

traversaro commented 1 year ago

@conda-forge-admin, please rerender

traversaro commented 1 year ago

Last failure is Linux with cuda:

2023-06-27T17:49:14.4288224Z /home/conda/feedstock_root/build_artifacts/onnxruntime_1687879404231/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin/../lib/gcc/x86_64-conda-linux-gnu/10.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/conda/feedstock_root/build_artifacts/onnxruntime_1687879404231/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib//libonnxruntime.so: undefined reference to `memcpy@GLIBC_2.14'

CUDA builds use centos7's glibc, we need to make sure that this constraint is passed to the onnxruntime-cpp test.

traversaro commented 1 year ago

@conda-forge-admin, please rerender

traversaro commented 1 year ago

Python 3.11 fails with:

+ python -m pip install onnxruntime-1.15.1-py3-none-any.whl
Processing ./onnxruntime-1.15.1-py3-none-any.whl
Discarding file://$SRC_DIR/onnxruntime-1.15.1-py3-none-any.whl: Requested onnxruntime==1.15.1 from file://$SRC_DIR/onnxruntime-1.15.1-py3-none-any.whl has inconsistent name: expected 'onnxruntime', but metadata has 'onnxruntime-gpu'
ERROR: Could not find a version that satisfies the requirement onnxruntime (unavailable) (from versions: none)
ERROR: No matching distribution found for onnxruntime (unavailable)

In the onnxruntime_gpu case it does not make sense apparently to rename the .whl file.

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

The outputs section contained an unexpected subsection name. string is not a valid subsection name.

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

traversaro commented 1 year ago

Ok, I tested locally the GPU builds, Python via onnxruntime_test and C++ via https://github.com/ami-iit/onnx-cpp-benchmark, and they are working fine.

Once the PR has been merged, it should be possible to install non-CUDA builds via mamba install onnxruntime=*=*cpu and CUDA builds with mamba install onnxruntime=*=*cuda.

The PR is now ready for review @conda-forge/onnxruntime, this are the things it would be useful to get a feedback:

Even by only buildng with CUDA 11.2, the PR adds 16 jobs to the already existing 42 jobs. Is that too much? There are are combinations (for example cuda+novec) that we could skip?
I am not a big expert on the handling of build string, so feel free to provide feedback on that part (I investigate what to do in https://github.com/conda-forge/librealsense-feedstock/issues/19#issuecomment-1445464186).
onnxruntime-gpu package on PyPI also include the TensorRTExecutionProvider, however at that moment I guess we can't build that on conda-forge. However, we can always add it in a future PR.

traversaro commented 1 year ago

Hello @conda-forge/onnxruntime, do you have any input for this PR? Thanks in advance!

traversaro commented 1 year ago

Thanks @jtilly !

CCRcmcpe commented 1 year ago

mamba install onnxruntime=*=*cuda seems not so obvious IMO. Is it possible to make something like mamba install onnxruntime-cuda, like faiss-gpu does?

traversaro commented 1 year ago

mamba install onnxruntime=*=*cuda seems not so obvious IMO. Is it possible to make something like mamba install onnxruntime-cuda, like faiss-gpu does?

Thanks for the suggestion @CCRcmcpe ! Can you open a new issue for discussing this?

conda-forge / onnxruntime-feedstock

Enable CUDA (take two) #63