Closed Luke-Pratley closed 6 years ago
@ilectra I get the error output, using 5 nodes and 3 wavelets. The line of code where the problem happens is
89 t_real const gamma
90 = (Psi.adjoint() * (measurements->adjoint() * uv_data.vis)).cwiseAbs().maxCoeff() * 1e-3;
in cpp/example/padmm_mpi_random_coverage.cc
.
Assertion failed: (mat.rows()>0 && mat.cols()>0 && "you are using an empty matrix"), function run, file /Users/luke/dev/pur
ify/build/external/include/eigen3/Eigen/src/Core/Redux.h, line 175.
[lukes-MacBook-Air:91370] *** Process received signal ***
[lukes-MacBook-Air:91370] Signal: Abort trap: 6 (6)
[lukes-MacBook-Air:91370] Signal code: (0)
[lukes-MacBook-Air:91370] [ 0] 0 libsystem_platform.dylib 0x00007fff7d19af5a _sigtramp + 26
[lukes-MacBook-Air:91370] [ 1] 0 ??? 0xbf67e96c66830cf2 0x0 + 13792249035631037682
[lukes-MacBook-Air:91370] [ 2] 0 libsystem_c.dylib 0x00007fff7cfc630a abort + 127
[lukes-MacBook-Air:91370] [ 3] 0 libsystem_c.dylib 0x00007fff7cf8e360 basename_r + 0
[lukes-MacBook-Air:91370] [ 4] 0 global_epsilon_replicated_grids 0x0000000100af82f0 _ZN5Eigen8internal10redux_implINS
0_13scalar_max_opIdEENS_12CwiseUnaryOpINS0_13scalar_abs_opISt7complexIdEEEKNS_13ReturnByValueIN4sopt7details15AppliedFuncti
onIRKSt8functionIFvRNS_6MatrixIS7_Lin1ELi1ELi0ELin1ELi1EEERKSF_EENS_10MatrixBaseINS9_INSC_ISM_NSN_ISF_EEEEEEEEEEEEEELi0ELi0
EE3runERKSV_RKS3_ + 115
[lukes-MacBook-Air:91370] [ 5] 0 global_epsilon_replicated_grids 0x0000000100aec9a2 _ZNK5Eigen9DenseBaseINS_12CwiseUn
aryOpINS_8internal13scalar_abs_opISt7complexIdEEEKNS_13ReturnByValueIN4sopt7details15AppliedFunctionIRKSt8functionIFvRNS_6M
atrixIS5_Lin1ELi1ELi0ELin1ELi1EEERKSD_EENS_10MatrixBaseINS7_INSA_ISK_NSL_ISD_EEEEEEEEEEEEEEE5reduxINS2_13scalar_max_opIdEEEENS2_9result_ofIFT_dEE4typeERKSZ_ + 46
[lukes-MacBook-Air:91370] [ 6] 0 global_epsilon_replicated_grids 0x0000000100adf4c5 _ZNK5Eigen9DenseBaseINS_12CwiseUn
aryOpINS_8internal13scalar_abs_opISt7complexIdEEEKNS_13ReturnByValueIN4sopt7details15AppliedFunctionIRKSt8functionIFvRNS_6M
atrixIS5_Lin1ELi1ELi0ELin1ELi1EE
ERKSD_EENS_10MatrixBaseINS7_INSA_ISK_NSL_ISD_EEEEEEEEEEEEEEE8maxCoeffEv + 43
[lukes-MacBook-Air:91370] [ 7] 0 global_epsilon_replicated_grids 0x0000000100ad65fa _Z13padmm_factoryRKSt10shared_ptr
IKN4sopt15LinearTransformIN5Eigen6MatrixISt7complexIdELin1ELi1ELi0ELin1ELi1EEEEEERKNS0_8wavelets4SARAERKNS2_5ArrayIS5_Lin1E
Lin1ELi0ELin1ELin1EEERKN6purify9utilities10vis_paramsEdRKNS0_3mpi12CommunicatorE + 522
[lukes-MacBook-Air:91370] [ 8] 0 global_epsilon_replicated_grids 0x0000000100ad7b71 main + 1918
[lukes-MacBook-Air:91370] [ 9] 0 libdyld.dylib 0x00007fff7cf1a145 start + 1
[lukes-MacBook-Air:91370] *** End of error message ***
[lukes-MacBook-Air:91369] *** Process received signal ***
[lukes-MacBook-Air:91369] Signal: Abort trap: 6 (6)
[lukes-MacBook-Air:91369] Signal code: (0)
[lukes-MacBook-Air:91369] [ 0] 0 libsystem_platform.dylib 0x00007fff7d19af5a _sigtramp + 26
[lukes-MacBook-Air:91369] [ 1] 0 ??? 0xbf67e96c66830cf2 0x0 + 13792249035631037682
[lukes-MacBook-Air:91369] [ 2] 0 libsystem_c.dylib 0x00007fff7cfc630a abort + 127
[lukes-MacBook-Air:91369] [ 3] 0 libsystem_c.dylib 0x00007fff7cf8e360 basename_r + 0
[lukes-MacBook-Air:91369] [ 4] 0 global_epsilon_replicated_grids 0x000000010f3eb2f0 _ZN5Eigen8internal10redux_implINS
0_13scalar_max_opIdEENS_12CwiseUnaryOpINS0_13scalar_abs_opISt7complexIdEEEKNS_13ReturnByValueIN4sopt7details15AppliedFuncti
onIRKSt8functionIFvRNS_6MatrixIS7_Lin1ELi1ELi0ELin1ELi1EEERKSF_EENS_10MatrixBaseINS9_INSC_ISM_NSN_ISF_EEEEEEEEEEEEEELi0ELi0
EE3runERKSV_RKS3_ + 115
[lukes-MacBook-Air:91369] [ 5] 0 global_epsiloAssertion failed: (mat.rows()>0 && mat.cols()>0 && "you are using an empty
matrix"), function run, file /Users/luke/dev/purify/build/external/include/eigen3/Eigen/src/Core/Redux.h, line 175.
--------------------------------------------------------------------------
mpirun noticed that process rank 4 with PID 0 on node lukes-MacBook-Air exited on signal 6 (Abort trap: 6).
--------------------------------------------------------------------------
This error says that there is an empty matrix/vector, possibly when trying to use cwiseAbs()
or maxCoeff()
.
@ilectra It is definitely a problem with that line of code. Which suggests it is not really a problem with SARA. If I replace that line with t_real gamma = 1;
, there are no problems... Maybe the adjoint of SARA is returning an empty Vector, and .maxCoeff()
is trying to find the maximum of it on the extra nodes?
factor = 0
in
https://github.com/astro-informatics/sopt/blob/development/cpp/sopt/wavelets.h#L227 , which would given an empty vector on nodes without wavelets. Probably causing the error!
When there are more MPI procs than SARA wavelets, SARA crashes.
Maybe this can be fixed with a split communicator.