Add CMAKE_CUDA_ARCHITECTURES=all to the CMake options

traversaro commented 1 year ago

Checklist

[x] Used a personal fork of the feedstock to propose changes
[x] Bumped the build number (if the version is unchanged)
[ ] Reset the build number to 0 (if the version changed)
[x] Re-rendered with the latest conda-smithy (Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
[x] Ensured the license file is being packaged.

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

traversaro commented 1 year ago

@conda-forge-admin, please rerender

jakirkham commented 1 year ago

Thoughts @conda-forge/onnxruntime ?

jakirkham commented 1 year ago

@conda-forge-admin, please re-render

xhochy commented 1 year ago

Sounds fine 👍 CI issues are because of missing storage space when packaging up the build environment.

traversaro commented 1 year ago

Sorry, I forgot to follow up. I tested on a NVIDIA GeForce RTX 3050 that is Ampere 86, that is not part of the architecture for which onnxruntime compiles by default, compiling also for 86 with CMAKE_CUDA_ARCHITECTURES=all give us a speedup of ~20% . The size of the artifact goes from ~20 MB to ~30 MB, so I think it is an acceptable tradeoff.

traversaro commented 1 year ago

@conda-forge-admin, please rerender

traversaro commented 1 year ago

aarch64 builds are still failing with:

[699/1181] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu.o
FAILED: CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu.o 
$BUILD_PREFIX/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_DONT_VECTORIZE=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DONNX_USE_LITE_PROTO=1 -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DUSE_CUDA=1 -D_GNU_SOURCE -D__ONNX_NO_DOC_STRINGS -Donnxruntime_providers_cuda_EXPORTS -I$SRC_DIR/include/onnxruntime -I$SRC_DIR/include/onnxruntime/core/session -I$SRC_DIR/build-ci/Release/_deps/pytorch_cpuinfo-src/include -I$SRC_DIR/build-ci/Release/_deps/google_nsync-src/public -I$SRC_DIR/build-ci/Release -I$SRC_DIR/onnxruntime -I$SRC_DIR/build-ci/Release/_deps/abseil_cpp-src -I$SRC_DIR/build-ci/Release/_deps/safeint-src -I$SRC_DIR/build-ci/Release/_deps/gsl-src/include -I$SRC_DIR/build-ci/Release/_deps/onnx-src -I$SRC_DIR/build-ci/Release/_deps/onnx-build -I$SRC_DIR/build-ci/Release/_deps/protobuf-src/src -I$SRC_DIR/build-ci/Release/_deps/flatbuffers-src/include -I$PREFIX/include -I$SRC_DIR/build-ci/Release/_deps/eigen-src -I/usr/local/cuda/targets/sbsa-linux/include -I$SRC_DIR/build-ci/Release/_deps/mp11-src/include --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG -std=c++17 --generate-code=arch=compute_35,code=[sm_35] --generate-code=arch=compute_37,code=[sm_37] --generate-code=arch=compute_50,code=[sm_50] --generate-code=arch=compute_52,code=[sm_52] --generate-code=arch=compute_53,code=[sm_53] --generate-code=arch=compute_60,code=[sm_60] --generate-code=arch=compute_61,code=[sm_61] --generate-code=arch=compute_62,code=[sm_62] --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_86,code=[compute_86,sm_86] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu.o.d -x cu -c $SRC_DIR/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu -o CMakeFiles/onnxruntime_providers_cuda.dir$SRC_DIR/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
$SRC_DIR/onnxruntime/core/providers/cuda/cu_inc/binary_elementwise_impl.cuh(38): catastrophic error: error while writing generated C file: No space left on device

1 catastrophic error detected in the compilation of "$SRC_DIR/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu".
Compilation terminated.

i.e. out of space as mentioned by @xhochy . If there is no other solution, probably we can just disable CMAKE_CUDA_ARCHITECTURES=all on aarch64 .

traversaro commented 1 year ago

@conda-forge-admin, please rerender

xhochy commented 1 year ago

CI passed and I think the size increase is reasonable.

traversaro commented 1 year ago

CI passed and I think the size increase is reasonable.

Indeed, azure:\n free_disk_space: true did the trick. @conda-forge/onnxruntime the PR is ready for review.

conda-forge / onnxruntime-feedstock

Add CMAKE_CUDA_ARCHITECTURES=all to the CMake options #67