eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
64 stars 14 forks source link

DLA-FUTURE doesn't compile with apple-clang and clang 17 #1017

Closed gulivarese closed 11 months ago

gulivarese commented 1 year ago
warning: inline function 'dlaf::eigensolver::internal::solveRank1ProblemDist(pika::execution::experimental::unique_any_sender<pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::comm::Communicator, const dlaf::comm::Communicator, pika::execution::experimental::async_rw_mutex_access_type::readwrite>> &&, pika::execution::experimental::unique_any_sender<pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::comm::Communicator, const dlaf::comm::Communicator, pika::execution::experimental::async_rw_mutex_access_type::readwrite>> &&, const SizeType, const SizeType, const LocalTileIndex, const LocalTileSize, pika::split_detail::split_sender_impl<pika::then_detail::then_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::when_all_impl::when_all_sender_impl<pika::when_all_vector_detail::when_all_vector_sender_impl<pika::execution::experimental::any_sender<pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::matrix::Tile<dlaf::eigensolver::internal::ColType, dlaf::Device::CPU>, const dlaf::matrix::Tile<const dlaf::eigensolver::internal::ColType, dlaf::Device::CPU>, pika::execution::experimental::async_rw_mutex_access_type::read>>>::when_all_vector_sender_type, pika::when_all_vector_detail::when_all_vector_sender_impl<pika::execution::experimental::any_sender<pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::matrix::Tile<float, Device::CPU>, const dlaf::matrix::Tile<const float, Device::CPU>, pika::execution::experimental::async_rw_mutex_access_type::read>>>::when_all_vector_sender_type, pika::when_all_vector_detail::when_all_vector_sender_impl<pika::execution::experimental::any_sender<pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::matrix::Tile<long, dlaf::Device::CPU>, const dlaf::matrix::Tile<const long, dlaf::Device::CPU>, pika::execution::experimental::async_rw_mutex_access_type::read>>>::when_all_vector_sender_type, pika::when_all_vector_detail::when_all_vector_sender_impl<pika::execution::experimental::unique_any_sender<dlaf::matrix::Tile<long, dlaf::Device::CPU>>>::when_all_vector_sender_type>::when_all_sender_type, pika::execution::experimental::thread_pool_scheduler>::schedule_from_sender_type, dlaf::common::internal::ConsumeRvalues<dlaf::common::internal::Unwrapping<(lambda at /Users/guglielmogagliardi/code/DLA-Future/include/dlaf/eigensolver/tridiag_solver/merge.h:430:18)>>>::then_sender_type, std::allocator<int>>::split_sender_type &, pika::split_detail::split_sender_impl<pika::then_detail::then_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::split_detail::split_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::then_detail::then_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::when_all_impl::when_all_sender_impl<pika::execution::experimental::unique_any_sender<dlaf::matrix::Tile<float, Device::CPU>>, pika::execution::experimental::unique_any_sender<dlaf::matrix::Tile<float, Device::CPU>>>::when_all_sender_type, pika::execution::experimental::thread_pool_scheduler>::schedule_from_sender_type, dlaf::common::internal::ConsumeRvalues<dlaf::common::internal::Unwrapping<dlaf::eigensolver::internal::cuppensDecomp_t>>>::then_sender_type, std::allocator<int>>::ensure_started_sender_type, std::allocator<int>>::split_sender_type, pika::execution::experimental::thread_pool_scheduler>::schedule_from_sender_type, dlaf::common::internal::ConsumeRvalues<dlaf::common::internal::Unwrapping<(lambda at /Users/guglielmogagliardi/code/DLA-Future/include/dlaf/eigensolver/tridiag_solver/merge.h:201:51)>>>::then_sender_type, std::allocator<int>>::split_sender_type &&, Matrix<const float, Device::CPU> &, Matrix<float, Device::CPU> &, Matrix<float, Device::CPU> &, Matrix<const SizeType, Device::CPU> &, Matrix<float, Device::CPU> &)::(anonymous class)::operator()(const std::size_t, std::unique_ptr<pika::barrier<>> &, pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::comm::Communicator, const dlaf::comm::Communicator, pika::execution::experimental::async_rw_mutex_access_type::readwrite> &, pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::comm::Communicator, const dlaf::comm::Communicator, pika::execution::experimental::async_rw_mutex_access_type::readwrite> &, const long &, const float &, const std::vector<pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::matrix::Tile<float, Device::CPU>, const dlaf::matrix::Tile<const float, Device::CPU>, pika::execution::experimental::async_rw_mutex_access_type::read>> &, std::vector<dlaf::matrix::Tile<float, Device::CPU>> &, const std::vector<dlaf::matrix::Tile<float, Device::CPU>> &, const std::vector<pika::execution::experimental::async_rw_mutex_access_wrapper<dlaf::matrix::Tile<long, dlaf::Device::CPU>, const dlaf::matrix::Tile<const long, dlaf::Device::CPU>, pika::execution::experimental::async_rw_mutex_access_type::read>> &, const std::vector<dlaf::matrix::Tile<float, Device::CPU>> &, std::vector<dlaf::memory::MemoryView<float, Device::CPU>> &, dlaf::memory::MemoryView<float, Device::CPU> &)::sync_wait_on_exit_t::~sync_wait_on_exit_t' is not defined [-Wundefined-inline]
          ~sync_wait_on_exit_t() {
          ^
/Users/guglielmogagliardi/code/DLA-Future/include/dlaf/eigensolver/tridiag_solver/merge.h:1178:11: note: used here
        } bcast_barrier;
          ^
/Users/guglielmogagliardi/code/DLA-Future/include/dlaf/eigensolver/tridiag_solver/merge.h:1174:11:

Undefined symbols:
  __ZZZN4dlaf11eigensolver8internal21solveRank1ProblemDistIdN4pika9execution12experimental17unique_any_senderIJNS5_29async_rw_mutex_access_wrapperINS_4comm12CommunicatorEKS9_LNS5_26async_rw_mutex_access_typeE1EEEEEERNS3_12split_detail17split_sender_implINS3_11then_detail16then_sender_implINS3_20schedule_from_detail25schedule_from_sender_implINS3_13when_all_impl20when_all_sender_implIJNS3_22when_all_vector_detail27when_all_vector_sender_implINS5_10any_senderIJNS7_INS_6matrix4TileINS1_7ColTypeELNS_6DeviceE0EEEKNSQ_IKSR_LSS_0EEELSB_0EEEEEEE27when_all_vector_sender_typeENSN_INSO_IJNS7_INSQ_IdLSS_0EEEKNSQ_IKdLSS_0EEELSB_0EEEEEEE27when_all_vector_sender_typeENSN_INSO_IJNS7_INSQ_IlLSS_0EEEKNSQ_IKlLSS_0EEELSB_0EEEEEEE27when_all_vector_sender_typeENSN_INS6_IJS19_EEEE27when_all_vector_sender_typeEEE20when_all_sender_typeENS5_21thread_pool_schedulerEE25schedule_from_sender_typeENS_6common8internal14ConsumeRvaluesINS1Q_10UnwrappingIZNS1_32stablePartitionIndexForDeflationIdEEDallRNSP_6MatrixISU_LSS_0EEERNS1V_IKT_LSS_0EEERNS1V_IS1A_LSS_0EEERNS1V_IlLSS_0EEEEUlRS1Z_RKT0_RKT1_RKT2_E_EEEEE16then_sender_typeENSt3__19allocatorIiEEE17split_sender_typeENSF_INSH_INSJ_INSF_INS3_21ensure_started_detail26ensure_started_sender_implINSH_INSJ_INSL_IJNS6_IJS11_EEES2T_EE20when_all_sender_typeES1M_E25schedule_from_sender_typeENS1R_INS1S_INS1_15cuppensDecomp_tEEEEEE16then_sender_typeES2N_E26ensure_started_sender_typeES2N_E17split_sender_typeES1M_E25schedule_from_sender_typeENS1R_INS1S_IZNS1_8scaleRhoIS36_EES1U_OS1Y_EUlS1Y_E_EEEEE16then_sender_typeES2N_E17split_sender_typeEEEvOS27_S3I_llNS1P_7Index2DIlNSP_13LocalTile_TAGEEENS1P_6Size2DIlS3K_EEOS2A_OS2D_S21_RNS1V_IS1Y_LSS_0EEES3R_S23_S3R_ENKUlmRS1Y_RS27_RS2A_S2F_RKT3_RKT4_RT5_RKT6_RKT7_RKT8_RT9_RT10_E_clINS2L_10unique_ptrINS3_7barrierINS3_6detail18empty_oncompletionEEENS2L_14default_deleteIS4M_EEEESC_SC_ldNS2L_6vectorIS15_NS2M_IS15_EEEENS4Q_IS11_NS2M_IS11_EEEES4U_NS4Q_IS1D_NS2M_IS1D_EEEES4U_NS4Q_INS_6memory10MemoryViewIdLSS_0EEENS2M_IS4Z_EEEES4Z_EES1U_mS3S_S3T_S3U_S2F_S3X_S40_S42_S45_S48_S4B_S4D_S4F_EN19sync_wait_on_exit_tD1Ev, referenced from: ...
rasolca commented 1 year ago

@albestro @msimberg I don't see any problem regarding lines https://github.com/eth-cscs/DLA-Future/blob/master/include/dlaf/eigensolver/tridiag_solver/merge.h#L1171-L1178 Can you please confirm that it is valid C++. If yes, I don't know if we should invest time in making apple-clang work.

msimberg commented 1 year ago

@rasolca I don't know if that is for some reason not guaranteed to work by the standard, but it does look reasonable to me as well.

That said, I would guess that moving the struct definition outside of the lambda and solveRank1ProblemDist would most likely fix the linker error. If that's enough to fix it I'd consider it worth the workaround to make apple-clang work. If it's more complicated than that I'm not as sure anymore...

rasolca commented 1 year ago

Update: clang 17 fails as well. Therefore I will increase the priority of this issue. Anyway, I implemented a quick workaround for @gulivarese such that he can continue with his task.

albestro commented 11 months ago

Not sure it is related, but from a very quick look it seems very similar: error is the same and code looks almost the same as our one.

https://github.com/llvm/llvm-project/issues/59734

Unfortunately no solution there and the issue is almost 1 year old, but at least it looks like something known.

UPDATE

High likely, this is the problem. @rasolca tested this with clang@14 and it works, while with clang@16 it complains (as expected, since it is a clang@15 regression).

I would go for just extracting that struct definition from the lambda, not a big deal. If you approve @rasolca @msimberg with a 👍, I can take care of it 😉