IntelPython / dpctl

Python SYCL bindings and SYCL-based Python Array API library
https://intelpython.github.io/dpctl/
Apache License 2.0
97 stars 29 forks source link

Remove USE_SYCL_FOR_COMPLEX_TYPES #1707

Closed oleksandr-pavlyk closed 3 weeks ago

oleksandr-pavlyk commented 3 weeks ago

Always use SYCL namespace functions. Also replaced some std namespace functions with corresponding sycl namespace functions.

github-actions[bot] commented 3 weeks ago

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_66 ran successfully. Passed: 890 Failed: 11 Skipped: 91

coveralls commented 3 weeks ago

Coverage Status

coverage: 88.057%. remained the same when pulling 591bc916a3c6a6491f314eb74309a9b7d32e9112 on replace-some-std-namepsace-functions-with-sycl-namespace-functions into b24c5b06d00b2e28510a238fc4879b71bed8b254 on master.

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_67 ran successfully. Passed: 889 Failed: 12 Skipped: 91

coveralls commented 3 weeks ago

Coverage Status

coverage: 88.057%. remained the same when pulling 6e226772a9bc5bb2494eb0d867923fec62ee419b on replace-some-std-namepsace-functions-with-sycl-namespace-functions into b24c5b06d00b2e28510a238fc4879b71bed8b254 on master.

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_68 ran successfully. Passed: 890 Failed: 11 Skipped: 91

coveralls commented 3 weeks ago

Coverage Status

coverage: 88.057%. remained the same when pulling 69ce77665c0fcd72568f3ab04015fc39042a7709 on replace-some-std-namepsace-functions-with-sycl-namespace-functions into b24c5b06d00b2e28510a238fc4879b71bed8b254 on master.

ndgrigorian commented 3 weeks ago

Tested build with CUDA and it built successfully, so this LGTM.

Since I had to add pre-commit changes, can @antonwolfy or @vtavana please take a look at the changes and review this PR? To keep score up.

I still see some uses of std::sin, std::cos, etc. but I think these can be replaced in a follow-up PR.

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_69 ran successfully. Passed: 889 Failed: 12 Skipped: 91

coveralls commented 3 weeks ago

Coverage Status

coverage: 88.057%. remained the same when pulling 163bd3c8819c34f1b885e43b462a218e44a497de on replace-some-std-namepsace-functions-with-sycl-namespace-functions into b24c5b06d00b2e28510a238fc4879b71bed8b254 on master.

oleksandr-pavlyk commented 3 weeks ago

The last commit, although green in CI, no longer build for NVPTX64 target, crashing during build of cumulative_logsumexp.cpp file.

It is best to report this to the attention of compiler engineers as a standalone reproducer and revert the culprit.

Crash log ``` PLEASE append the compiler options "-save-temps -v", rebuild the application to get the full command which is failing and submit a bug report to https://software.intel.com/e n-us/support/priority-support which includes the failing command, input files for the command and the crash backtrace (if any). Stack dump: 0. Program arguments: /opt/intel/oneapi/compiler/2024.1/bin/compiler/clang -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdecl are-spirv-builtins -fenable-sycl-dae -Wno-sycl-strict -fsycl-int-header=/tmp/icpx-c6c78eb77d/cumulative_logsumexp-header-b07e23.h -fsycl-int-footer=/tmp/icpx-c6c78eb77d/cumu lative_logsumexp-footer-4fbcd6.h -fno-sycl-id-queries-fit-in-int -sycl-std=2020 -fsycl-unique-prefix=uidca92c3376666e8c5 -D__SYCL_TARGET_NVIDIA_GPU_SM_50__ -D__SYCL_ANY_DEVI CE_HAS_ANY_ASPECT__=1 -emit-llvm-bc -emit-llvm-uselists -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name cumulative_logsum exp.cpp -fsycl-use-main-file-name -full-main-file-name /localdisk/work/opavlyk/repos/dpctl/dpctl/tensor/libtensor/source/accumulators/cumulative_logsumexp.cpp -mrelocation-m odel pic -pic-level 2 -fhalf-no-semantic-interposition -fveclib=SVML -fno-delete-null-pointer-checks -mframe-pointer=all -ffp-contract=fast -fno-rounding-math -no-integrated -as -aux-target-cpu x86-64 -internal-isystem /opt/intel/oneapi/compiler/2024.1/bin/compiler/../../include/sycl -internal-isystem /opt/intel/oneapi/compiler/2024.1/bin/compil er/../../include/sycl/stl_wrappers -internal-isystem /opt/intel/oneapi/compiler/2024.1/bin/compiler/../../include -mlink-builtin-bitcode /opt/intel/oneapi/compiler/2024.1/li b/clang/18/../../clc/remangled-l64-signed_char.libspirv-nvptx64-nvidia-cuda.bc -mlink-builtin-bitcode /usr/local/cuda/nvvm/libdevice/libdevice.10.bc -target-sdk-version=12.1 -target-cpu sm_50 -target-feature +ptx81 -debugger-tuning=gdb -fno-dwarf-directory-asm -fdebug-compilation-dir=/localdisk/work/opavlyk/repos/dpctl/_skbuild/linux-x86_64-3.1 0/cmake-build -fclang-abi-compat=17 -resource-dir /opt/intel/oneapi/compiler/2024.1/lib/clang/18 -dependency-file dpctl/tensor/CMakeFiles/_tensor_accumulation_impl.dir/libte nsor/source/accumulators/cumulative_logsumexp.cpp.o.d -MT dpctl/tensor/CMakeFiles/_tensor_accumulation_impl.dir/libtensor/source/accumulators/cumulative_logsumexp.cpp.o -sys -header-deps -internal-isystem /opt/intel/oneapi/compiler/2024.1/bin/compiler/../../include/sycl -internal-isystem /opt/intel/oneapi/compiler/2024.1/bin/compiler/../../inclu de/sycl/stl_wrappers -internal-isystem /opt/intel/oneapi/compiler/2024.1/bin/compiler/../../include -isystem /localdisk/work/opavlyk/repos/dpctl/_skbuild/linux-x86_64-3.10/c make-build/_deps/pybind11-src/include -isystem /localdisk/work/opavlyk/miniconda3/envs/dev_dpctl/include/python3.10 -I /opt/intel/oneapi/compiler/2024.1/include -I /localdis k/work/opavlyk/repos/dpctl/dpctl/tensor/libtensor/include -I /localdisk/work/opavlyk/repos/dpctl/dpctl/tensor/libtensor/source -I /localdisk/work/opavlyk/repos/dpctl/dpctl/a pis/include -I /localdisk/work/opavlyk/repos/dpctl/libsyclinterface/include -I /localdisk/work/opavlyk/repos/dpctl/_skbuild/linux-x86_64-3.10/cmake-build -I /localdisk/work/ opavlyk/repos/dpctl/_skbuild/linux-x86_64-3.10/cmake-build/dpctl -D _tensor_accumulation_impl_EXPORTS -D _FORTIFY_SOURCE=2 -D NDEBUG -I/opt/intel/oneapi/tbb/2021.12/env/../i nclude -I/opt/intel/oneapi/mpi/2021.12/include -I/opt/intel/oneapi/mkl/2024.1/include -I/opt/intel/oneapi/ippcp/2021.11/include -I/opt/intel/oneapi/ipp/2021.11/include -I/op t/intel/oneapi/dpl/2022.5/include -I/opt/intel/oneapi/dpcpp-ct/2024.1/include -I/opt/intel/oneapi/dnnl/2024.1/include -I/opt/intel/oneapi/dev-utilities/2024.1/include -I/opt /intel/oneapi/dal/2024.2/include/dal -I/opt/intel/oneapi/compiler/2024.1/opt/oclfpga/include -I/opt/intel/oneapi/ccl/2021.12/include -internal-isystem /opt/intel/oneapi/comp iler/2024.1/bin/compiler/../../opt/compiler/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 -internal-isystem /usr/lib/gcc/x86_64-linux -gnu/11/../../../../include/x86_64-linux-gnu/c++/11 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward -internal-isystem /usr/lib/gcc/x86 _64-linux-gnu/11/../../../../include/c++/11 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 -internal-isystem /usr/lib/gcc/x86 _64-linux-gnu/11/../../../../include/c++/11/backward -internal-isystem /opt/intel/oneapi/compiler/2024.1/lib/clang/18/include -internal-isystem /usr/local/include -internal- isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -int ernal-externc-isystem /usr/include -internal-isystem /usr/local/cuda/include -internal-isystem /opt/intel/oneapi/compiler/2024.1/lib/clang/18/include -internal-isystem /usr/ local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-exter nc-isystem /include -internal-externc-isystem /usr/include -O3 -Wall -Wextra -Winit-self -Wunused-function -Wuninitialized -Wmissing-declarations -Wstrict-prototypes -Wno-un used-parameter -Wformat -Wformat-security -std=gnu++17 -fdeprecated-macro -ferror-limit 19 -fvisibility=hidden -fheinous-gnu-extensions -fwrapv -fgpu-rdc -fgnuc-version=4.2. 1 -fcxx-exceptions -fexceptions -vectorize-loops -vectorize-slp -D__GCC_HAVE_DWARF2_CFI_ASM=1 -fintel-compatibility -fintel-compatibility-disable=FakeLoad -fintel-libirc-all owed -mllvm -disable-hir-generate-mkl-call -mllvm -loopopt=1 -floopopt-pipeline=light -mllvm -intel-abi-compatible=true -o /tmp/icpx-c6c78eb77d/cumulative_logsumexp-sm_50-35 40fd.bc -x c++ /localdisk/work/opavlyk/repos/dpctl/dpctl/tensor/libtensor/source/accumulators/cumulative_logsumexp.cpp 1. parser at end of file 2. Optimizer #0 0x00005580122b5093 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5064093) #1 0x00005580122b3380 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5062380) #2 0x00005580122b5734 SignalHandler(int) Signals.cpp:0:0 #3 0x00007ff887e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) #4 0x00005580112f242c llvm::BasicTTIImplBase::getMemoryOpCost(unsigned int, llvm::Type*, llvm::MaybeAlign, unsigned int, llvm::TargetTransformInfo::Targ etCostKind, llvm::TargetTransformInfo::OperandValueInfo, llvm::Instruction const*) NVPTXTargetMachine.cpp:0:0 #5 0x00005580112f57ae llvm::BasicTTIImplBase::getCommonMaskedMemoryOpCost(unsigned int, llvm::Type*, llvm::Align, bool, bool, llvm::TargetTransformInfo: :TargetCostKind) NVPTXTargetMachine.cpp:0:0 #6 0x00005580112eece7 llvm::TargetTransformInfo::Model::getMaskedMemoryOpCost(unsigned int, llvm::Type*, llvm::Align, unsigned int, llvm::TargetTransfor mInfo::TargetCostKind) NVPTXTargetMachine.cpp:0:0 #7 0x00005580125163ed llvm::vpo::VPlanTTICostModel::getLoadStoreCost(llvm::vpo::VPLoadStoreInst const*, llvm::Align, unsigned int, bool) const (/opt/intel/oneapi/compiler/2 024.1/bin/compiler/clang+0x52c53ed) #8 0x0000558012517add llvm::vpo::VPlanTTICostModel::getTTICostForVF(llvm::vpo::VPInstruction const*, unsigned int) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5 2c6add) #9 0x000055801251793c llvm::vpo::VPlanTTICostModel::getTTICost(llvm::vpo::VPInstruction const*) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x52c693c) #10 0x00005580124b81c0 llvm::vpo::VPlanCostModelWithHeuristics, llvm::vpo::HeuristicsList, llvm::vpo::HeuristicsList>::getC ostImpl(llvm::vpo::VPBasicBlock const*, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0 #11 0x00005580124b8906 llvm::vpo::VPInstructionCost llvm::vpo::VPlanCostModelWithHeuristics, llvm::vpo::HeuristicsL ist, llvm::vpo::HeuristicsList>::getRangeCost>>(llvm::iterator_range>, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0 #12 0x00005580124b75c5 llvm::vpo::VPlanCostModelWithHeuristics, llvm::vpo::HeuristicsList, llvm::vpo::HeuristicsList>::getC ost(bool, llvm::vpo::VPlanPeelingVariant*, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0 #13 0x00005580124aa734 llvm::vpo::LoopVectorizationPlanner::selectBestPlan() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5259734) #14 0x00005580124c8672 llvm::vpo::DriverHIRImpl::processLoop(llvm::loopopt::HLLoop*, llvm::Function&, llvm::vpo::WRNVecLoopNode*) (/opt/intel/oneapi/compiler/2024.1/bin/comp iler/clang+0x5277672) #15 0x00005580124c4412 bool llvm::vpo::DriverImpl::processFunction(llvm::Function&) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5273412) #16 0x00005580124c7f25 llvm::vpo::VPlanDriverHIRPass::runImpl(llvm::Function&, llvm::AnalysisManager&, llvm::loopopt::HIRFramework&) (/opt/intel/oneapi/compi ler/2024.1/bin/compiler/clang+0x5276f25) #17 0x00005580136005f5 llvm::detail::PassModel>::run(llvm::Func tion&, llvm::AnalysisManager&) PassBuilder.cpp:0:0 #18 0x0000558011dfa945 llvm::PassManager>::run(llvm::Function&, llvm::AnalysisManager&) (/opt/intel/one api/compiler/2024.1/bin/compiler/clang+0x4ba9945) #19 0x00005580112edd9d llvm::detail::PassModel>, llvm::PreservedAnalyses, llvm::Analy sisManager>::run(llvm::Function&, llvm::AnalysisManager&) NVPTXTargetMachine.cpp:0:0 #20 0x0000558011e0177d llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager&) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x4b b077d) #21 0x00005580112edb0d llvm::detail::PassModel>::run(llvm::Modu le&, llvm::AnalysisManager&) NVPTXTargetMachine.cpp:0:0 #22 0x0000558011df9aaa llvm::PassManager>::run(llvm::Module&, llvm::AnalysisManager&) (/opt/intel/oneapi/comp iler/2024.1/bin/compiler/clang+0x4ba8aaa) #23 0x0000558012c63e3f (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__1::unique_ptr>&, std::__1::unique_ptr>&, clang::BackendConsumer*) BackendUtil.cpp:0 :0 #24 0x0000558012c5f36b clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clan g::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr, std::__1::unique_ptr>, clang::BackendConsumer*) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5a0e36b) #25 0x0000558012c7449f clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5a2349f) #26 0x00005580142c3b0a clang::ParseAST(clang::Sema&, bool, bool) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x7072b0a) #27 0x0000558012c77498 clang::CodeGenAction::ExecuteAction() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5a26498) #28 0x0000558012f8d8ca clang::FrontendAction::Execute() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5d3c8ca) #29 0x0000558012f13c29 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5cc2c29) #30 0x0000558013018d5e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x5dc7d5e) #31 0x0000558010f8f1c1 cc1_main(llvm::ArrayRef, char const*, void*) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x3d3e1c1) #32 0x0000558010f8c0c5 ExecuteCC1Tool(llvm::SmallVectorImpl&, llvm::ToolContext const&) driver.cpp:0:0 #33 0x0000558010f8bb61 clang_main(int, char**, llvm::ToolContext const&) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x3d3ab61) #34 0x0000558010f9839e main (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x3d4739e) #35 0x00007ff887e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #36 0x00007ff887e29e40 call_init ./csu/../csu/libc-start.c:128:20 #37 0x00007ff887e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5 #38 0x0000558010f893e9 _start (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang+0x3d383e9) icpx: error: unable to execute command: Segmentation fault (core dumped) icpx: error: clang frontend command failed due to signal (use -v to see invocation) Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2024.1/bin/compiler Configuration file: /opt/intel/oneapi/compiler/2024.1/bin/compiler/../icpx.cfg icpx: note: diagnostic msg: Error generating preprocessed source(s). ```
ndgrigorian commented 3 weeks ago

The last commit, although green in CI, no longer build for NVPTX64 target, crashing during build of cumulative_logsumexp.cpp file.

It is best to report this to the attention of compiler engineers as a standalone reproducer and revert the culprit.

Crash log

Culprit appears to be sycl::log1p. I will revert the change to use sycl::log1p in the logaddexp implementation before this is merged, among other std function replacements.

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_71 ran successfully. Passed: 889 Failed: 12 Skipped: 91

coveralls commented 3 weeks ago

Coverage Status

coverage: 88.057%. remained the same when pulling 6ea1120edce21942dd50733c904a4576f06177cc on replace-some-std-namepsace-functions-with-sycl-namespace-functions into b24c5b06d00b2e28510a238fc4879b71bed8b254 on master.