3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
443 stars 196 forks source link

Relion4 Class2d problems on M1 Mac #913

Open jxc100 opened 1 year ago

jxc100 commented 1 year ago

I can compile and run Relion4 on my ARM-based MacBook Pro with an OpenMP-related change to the CMakeLists.txt, but Class2D either deadlocks (VDAM) or crashes with a memory allocation error (EM). Info below. Only CPU code, of course. James Conway, U Pittsburgh: jxc100@pitt.edu

1. System info MacBook Pro 16-inch, 2021, M1 Max, 64 GBytes RAM. Mac OSX 12.5.1 (current), current XCode 13.4.1 Additional tools by HomeBrew:

2. Symbols defined (csh) setenv CXX g++-12 setenv CC gcc-12 setenv OMPI_CXX g++-12 setenv OMPI_CC gcc-12 setenv PATH "/opt/homebrew/opt/openmpi/bin:${PATH}" setenv CXXFLAGS "-I/opt/homebrew/opt/openmpi/include" setenv LDFLAGS "-L/opt/homebrew/opt/openmpi/lib"

3. Changes to CMakeLists.txt to enable openMP This inserts between the OpenMPI block and the Intel Compiler support block:

# ----------------------------------------------------------------------------OpenMP--  James Conway
# This block from: https://code-examples.net/en/q/10d10e9
# Use of -Xpreprocessor suggested here: https://stackoverflow.com/questions/40095958/apple-clang-fopenmp-not-working
# Still a linker problem with OMP: ld: symbol(s) not found for architecture arm64

OPTION (USE_OpenMP "Use OpenMP to enamble <omp.h>" ON)

# Find OpenMP
if(APPLE AND USE_OpenMP)
    if(CMAKE_C_COMPILER_ID MATCHES "Clang")
        set(OpenMP_C "${CMAKE_C_COMPILER}")
        set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument")
        set(OpenMP_C_LIB_NAMES "libomp" "libgomp" "libiomp5")
        set(OpenMP_libomp_LIBRARY ${OpenMP_C_LIB_NAMES})
        set(OpenMP_libgomp_LIBRARY ${OpenMP_C_LIB_NAMES})
        set(OpenMP_libiomp5_LIBRARY ${OpenMP_C_LIB_NAMES})
    endif()
    if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
      set(OpenMP_CXX "${CMAKE_CXX_COMPILER}")
      set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument")
      set(OpenMP_CXX_LIB_NAMES "libomp" "libgomp" "libiomp5")
      set(OpenMP_libomp_LIBRARY ${OpenMP_CXX_LIB_NAMES})
      set(OpenMP_libgomp_LIBRARY ${OpenMP_CXX_LIB_NAMES})
      set(OpenMP_libiomp5_LIBRARY ${OpenMP_CXX_LIB_NAMES})
    endif()
endif()

if(USE_OpenMP)
  find_package(OpenMP REQUIRED)
endif(USE_OpenMP)

if (OPENMP_FOUND)
#    include_directories("${OPENMP_INCLUDES},/opt/homebrew/include")
#    link_directories("${OPENMP_LIBRARIES},/opt/homebrew/lib")
    include_directories("/opt/homebrew/include")
    link_directories("/opt/homebrew/lib")
    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
    # set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif(OPENMP_FOUND)

4. Compiling Relion4 (4.0-beta-2-commit-3b1752) with the CMakeLists.txt modified as above git clone https://github.com/3dem/relion.git cd relion git checkout 4.0 mkdir -p build cd build cmake .. make -j 6 make install

5. Running Relion4 (4.0-beta-2-commit-3b1752) on the Tutorial beta-galactosidase dataset Import, Motion correction - no problem CtfEstimation - has to be done elsewhere because I can't get CTFFIND4 to compile AutoPick, Extract - no problem -- Extract: 5793 particles, 256x256 scaled to 64x64 (3.54 A/pixel) Class2D - deadlocks (VDAM) or crashes (EM) as described below

6. Class2D - VDAM with 1 MPI (required) and 2 threads This hangs with an apparent deadlock, both with 2 or 1 threads:

which relion_refine --o Class2D/job010/run --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --preread_images --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 2 --pipeline_control Class2D/job010/

 Running CPU instructions in double precision. 
 Initial subset size set to 200
 Final subset size set to 1000
 Estimating initial noise spectra 
   0/   0 sec ............................................................~~(,_,">
 Estimating accuracies in the orientational assignment ... 
   0/   0 sec ............................................................~~(,_,">
 Auto-refine: Estimated accuracy angles= 17.9 degrees; offsets= 9.912 Angstroms
 CurrentResolution= 56.64 Angstroms, which requires orientationSampling of at least 30 degrees for a particle of diameter 200 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 31500
 OrientationalSampling= 12 NrOrientations= 30
 TranslationalSampling= 7.08 NrTranslations= 21
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 1008000
 OrientationalSampling= 6 NrOrientations= 240
 TranslationalSampling= 3.54 NrTranslations= 84
=============================
 Gradient optimisation iteration 1 of 200 with 200 particles (Step size 0.9)
000/??? sec ~~(,_,">                                                          [oo]

This process never progresses. Sampling it seems to show its in a deadlock:

~ 2.645s Thread 16621156 DispatchQueue_1: com.apple.main-thread (serial)
 ~ 2.645s start (in dyld) + 520 [0x100f9908c]
  ~ 2.645s main (in relion_refine) + 80 [0x100b7d550]
   ~ 2.645s MlOptimiser::iterate() (in relion_refine) + 340 [0x100b2c954]
    ~ 2.645s M1Optimiser: : expectation) (in relion_refine) + 844 [0x100b109e07
     ~ 2.645s M1Optimiser: : expectationSomeParticles(long, long) (in relion_refine) + 1124 [0x100b0f6b4]
      ~ 2.645s GOMP_parallel (in libgomp.1.dylib) + 84 [0x1012f3c74]
       ~ 2.645s M1Optimiser: :expectationSomeParticles(long, long) (._omp_fn.0) (in relion_refine) + 100 [0x100b1df48]
        ~ 2.645s globalThreadExpectationSomeParticles(void*, int) (in relion_refine) + 100 [0x100b1de34]
         ~ 2.645s MlOptimiser:: expectationOneParticle(long, int) (in relion_refine) + 1824 [0x100b1cdf4]
          ~ 2.645s M1Optimiser: : storeWeightedSums(long, int, int, int, int, int, int, int, int, int, int, int, std: :vector<double, std: :allocator<double> >&, st
           ~ 2.645s _pthread_mutex_firstfit_lock_slow (in libsystempthread.dylib) + 248 [0x193a36cf8]
            ~ 2.645s _pthread_mutex_firstfit_lock_wait (in libsystempthread.dylib) + 84 [0x193a39384]
             ~ 2.645s __psynch_mutexwait (in libsystem_kernel.dylib) + 8 [0x193a01738]
~ 2.645s Thread 16621171
 ~ 2.645s thread_start (in libsystempthread.dylib) + 8 [0x193a3708c]
  ~ 2.645s _pthread_start (in libsystempthread.dylib) + 148 [0x193a3c26c]
   ~ 2.645s gomp_thread_start (in libgomp.1.dylib) + 308 [0x1012fa9b8]
    ~ 2.645s M1Optimiser: : expectationSomeParticles(long, long) (._omp_fn.0) (in relion_refine) + 100 [0x100b1df48]
     ~ 2.645s global ThreadExpectationSomeParticles(void*, int) (in relion_refine) + 56 [0x100b1de08]
      ~ 2.645s ParallelTaskDistributor: :getTasks(unsigned long&, unsigned long&) (in relion_refine) + 60 [0x100b3348c]
       ~ 2.645s _pthread_mutex_fairshare_lock_slow (in libsystem_pthread.dylib) + 196 [0x193a3f308]
        ~ 2.645s _pthread _mutex_fairshare_lock_wait (in libsystempthread.dylib) + 84 [0x193a3f3b07
         ~ 2.6455 __psych_mutexwait (in libsystem_kernel.dylib) + 8 [0x193a01738]

If I repeat with just one thread, the result is the same:

~ 2.651s Thread 16603999 DispatchQueue_1: com.apple.main-thread (serial)
 ~ 2.651s start (in dyld) + 520 [0x1046fd08c]
  ~ 2.651s main (in relion_refine) + 80 [0x104515550]
   ~ 2.651s Ml0ptimiser: :iterate() (in relion_refine) + 340 [0x1044c4954]
    ~ 2.651s M1Optimiser: : expectation) (in relion_refine) + 844 [0x1044a89€07
     ~ 2.651s M1Optimiser: : expectationSomeParticles(long, long) (in relion_refine) + 1124 [0x1044a76b4]
      ~ 2.651s GOMP_parallel (in libgomp.1.dylib) + 84 [0x104c83c74]
       ~ 2.651s M1Optimiser: : expectationSomeParticles(long, long) (._omp_fn.0) (in relion_refine) + 100 [0x1044b5f48]
        ~ 2.651s globalThreadExpectationSomeParticles(void*, int) (in relion_refine) + 56 [0x1044b5e08]
         ~ 2.651s ParallelTaskDistributor::getTasks(unsigned long&, unsigned long&) (in relion_refine) + 60 [0x1044cb48c]
          ~ 2.651s _thread_mutex_firstfit_lock_slow (in libsystem_pthread.dylib) + 248 [0x193a36cf8]
           ~ 2.651s _pthread_mutex_firstfit_lock_wait (in libsystempthread.dylib) + 84 [0x193a39384]
            ~ 2.651s __psynch_mutexwait (in libsystem_kernel.dylib) + 8 [0x193a017387

7. Class2D - EM This crashes with a malloc error:

which relion_refine_mpi --o Class2D/job012/run --iter 25 --i Extract/job009/particles.star --dont_combine_weights_via_disc --preread_images --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 2 --pipeline_control Class2D/job012/

RELION version: 4.0-beta-2-commit-3b1752 
Precision: BASE=double

 === RELION MPI setup ===
 + Number of MPI processes             = 2
 + Number of threads per MPI process   = 2
 + Total number of threads therefore   = 4
 + Leader  (0) runs on host            = JFC-MacBookPro-2022
 =================
 + Follower     1 runs on host            = JFC-MacBookPro-2022
 Running CPU instructions in double precision. 
 Estimating initial noise spectra 
   0/   0 sec ............................................................~~(,_,">
[JFC-MacBookPro-2022:36584] *** Process received signal ***
[JFC-MacBookPro-2022:36584] Signal: Abort trap: 6 (6)
[JFC-MacBookPro-2022:36584] Signal code:  (0)
[JFC-MacBookPro-2022:36584] [ 0] 0   libsystem_platform.dylib            0x0000000193a534a4 _sigtramp + 56
[JFC-MacBookPro-2022:36584] [ 1] 0   libsystem_pthread.dylib             0x0000000193a3bee0 pthread_kill + 288
[JFC-MacBookPro-2022:36584] [ 2] 0   libsystem_c.dylib                   0x0000000193976340 abort + 168
[JFC-MacBookPro-2022:36584] [ 3] 0   libsystem_malloc.dylib              0x00000001938588c0 has_default_zone0 + 0
[JFC-MacBookPro-2022:36584] [ 4] 0   libsystem_malloc.dylib              0x000000019386dc84 malloc_zone_error + 100
[JFC-MacBookPro-2022:36584] [ 5] 0   libsystem_malloc.dylib              0x000000019384abc4 nanov2_allocate_from_block + 568
[JFC-MacBookPro-2022:36584] [ 6] 0   libsystem_malloc.dylib              0x000000019384a1e0 nanov2_allocate + 128
[JFC-MacBookPro-2022:36584] [ 7] 0   libsystem_malloc.dylib              0x000000019384a0fc nanov2_malloc + 64
[JFC-MacBookPro-2022:36584] [ 8] 0   libsystem_malloc.dylib              0x0000000193867748 _malloc_zone_malloc + 156
[JFC-MacBookPro-2022:36584] [ 9] 0   relion_refine_mpi                   0x0000000104a071dc _ZN14MlOptimiserMpi11expectationEv + 124
[JFC-MacBookPro-2022:36584] [10] 0   relion_refine_mpi                   0x0000000104a1ff74 _ZN14MlOptimiserMpi7iterateEv + 2052
[JFC-MacBookPro-2022:36584] [11] 0   relion_refine_mpi                   0x0000000104a7171c main + 92
[JFC-MacBookPro-2022:36584] [12] 0   dyld                                0x0000000104e0108c start + 520
[JFC-MacBookPro-2022:36584] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node JFC-MacBookPro-2022 exited on signal 6 (Abort trap: 6).
--------------------------------------------------------------------------
biochem-fan commented 1 year ago

Thank you very much for the details.

This is interesting. It is not obvious at the moment why these issues arise only on Mac OS platforms.

I do have a M1 Mac Book Pro so I will look into it when I have time. (But I cannot promise ETA! I am afraid to say this is a low priority.)

biochem-fan commented 1 year ago

On my computer, I managed to compile without tweaking CMakeList.txt.

Installed the followings from HomeBrew.

export CXX=g++-12
export CC=gcc-12
export OMPI_CXX=g++-12
export OMPI_CC=gcc-12
cmake .. -DGUI=OFF -DFETCH_TORCH_MODELS=OFF -DCMAKE_BUILD_TYPE=DEBUG
make
biochem-fan commented 1 year ago

Oops, gdb does not support M1 Mac...

I have to learn lldb.

biochem-fan commented 1 year ago

On my computer, the EM algorithm does not crash due to malloc but dead locks.

In lldb:

(lldb) r --o Class2D/em --iter 25 --i Extract/job009/particles.star --dont_combine_weights_via_disc  --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 20 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 1

(Ctrl-C after it hangs)

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001c07ed738 libsystem_kernel.dylib`__psynch_mutexwait + 8
    frame #1: 0x00000001c0825384 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84
    frame #2: 0x00000001c0822cf8 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 248
    frame #3: 0x000000010018a61c relion_refine`MlOptimiser::storeWeightedSums(this=0x000000016fdfd180, part_id=9142, ibody=0, exp_current_oversampling=1, metadata_offset=0, exp_idir_min=0, exp_idir_max=0, exp_ipsi_min=0, exp_ipsi_max=29, exp_itrans_min=0, exp_itrans_max=20, exp_iclass_min=0, exp_iclass_max=19, exp_min_diff2=0x000000016fdfc7a8, exp_highres_Xi2_img=0x000000016fdfc7c0, exp_Fimg=0x000000016fdfc8f8, exp_Fimg_nomask=0x000000016fdfc8e0, exp_Fctf=0x000000016fdfc898, exp_power_img=0x000000016fdfc700, exp_old_offset=0x000000016fdfc748, exp_prior=0x000000016fdfc730, exp_Mweight=0x000000016fdfc5d0, exp_Mcoarse_significant=0x000000016fdfc668, exp_significant_weight=0x000000016fdfc778, exp_sum_weight=0x000000016fdfc790, exp_max_weight=0x000000016fdfc760, exp_pointer_dir_nonzeroprior=0x000000016fdfc838, exp_pointer_psi_nonzeroprior=0x000000016fdfc820, exp_directions_prior=0x000000016fdfc808, exp_psi_prior=0x000000016fdfc7f0, exp_local_Fimgs_shifted=0x000000016fdfc8c8, exp_local_Fimgs_shifted_nomask=0x000000016fdfc8b0, exp_local_Minvsigma2=0x000000016fdfc868, exp_local_Fctf=0x000000016fdfc880, exp_local_sqrtXi2=0x000000016fdfc7d8, exp_STMulti=0x000000016fdfc850) at ml_optimiser.cpp:8514:21
    frame #4: 0x000000010017a3f4 relion_refine`MlOptimiser::expectationOneParticle(this=0x000000016fdfd180, part_id_sorted=0, thread_id=0) at ml_optimiser.cpp:4326:20
    frame #5: 0x0000000100179ac8 relion_refine`MlOptimiser::doThreadExpectationSomeParticles(this=0x000000016fdfd180, thread_id=0) at ml_optimiser.cpp:4006:26
    frame #6: 0x000000010014a95c relion_refine`globalThreadExpectationSomeParticles(self=0x000000016fdfd180, thread_id=0) at ml_optimiser.cpp:79:40
    frame #7: 0x00000001001ca86c relion_refine`_ZN11MlOptimiser24expectationSomeParticlesEll._omp_fn.0((null)=0x000000016fdfce30) at ml_optimiser.cpp:3934:40
    frame #8: 0x0000000100c97c74 libgomp.1.dylib`GOMP_parallel + 84
    frame #9: 0x0000000100179978 relion_refine`MlOptimiser::expectationSomeParticles(this=0x000000016fdfd180, my_first_part_id=0, my_last_part_id=29) at ml_optimiser.cpp:3932:11
    frame #10: 0x0000000100177bfc relion_refine`MlOptimiser::expectation(this=0x000000016fdfd180) at ml_optimiser.cpp:3388:27
    frame #11: 0x0000000100176ef0 relion_refine`MlOptimiser::iterate(this=0x000000016fdfd180) at ml_optimiser.cpp:3013:14
    frame #12: 0x000000010000a298 relion_refine`main(argc=34, argv=0x000000016fdff210) at refine.cpp:39:20
    frame #13: 0x00000001005f908c dyld`start + 520

It is https://github.com/3dem/relion/blob/ver4.0/src/ml_optimiser.cpp#L8514.

biochem-fan commented 1 year ago

This is really puzzling. The above is the only place global_mutex2 is locked. With only one thread, it cannot dead lock...

I also confirmed it is initialized properly before.

thread list confirms there is only one thread.

biochem-fan commented 1 year ago

VDAM is stuck at the same place, MlOptimiser::storeWeightedSums, unlike the initial report of ParallelTaskDistributor::getTasks.

(lldb) r --o Class2D/vdam --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12--j 1
biochem-fan commented 1 year ago

Repeating tests, I got a dead lock in ParallelTaskDistributor::getTasks.

Somehow OpenMP's mutex lock is not working as expected, but I don't know what is the cause.

jxc100 commented 1 year ago

On my computer, I managed to compile without tweaking CMakeList.txt.

gcc-12 (Homebrew GCC 12.2.0) 12.2.0 cmake version 3.24.1 mpirun (Open MPI) 4.1.4 fftw 3.3.10 libomp 14.0.6 cmake .. -DGUI=OFF -DFETCH_TORCH_MODELS=OFF make

I didn't use -DGUI=OFF or -DFETCH_TORCH_MODELS=OFF The GUI worked ok, and I tend to use Relion that way. The torch stuff doesn't seem to make a difference.

On my computer, the EM algorithm does not crash due to malloc but dead locks. In lldb: ... It is probably https://github.com/3dem/relion/blob/ver4.0/src/ml_optimiser.cpp#L8514.

This is really puzzling. The above is the only place global_mutex2 is locked. With only one thread, it cannot dead lock...

That is the most curious point.

I also confirmed it is initialized properly before.

VDAM is stuck at the same place, MlOptimiser::storeWeightedSums, unlike the initial report of ParallelTaskDistributor::getTasks.

(lldb) r --o Class2D/vdam --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12--j 1

The sampler was in MlOptimiser::storeWeightedSums with a deadlock.

Repeating tests, I got a dead lock in ParallelTaskDistributor::getTasks. Somehow OpenMP's mutex lock is not working as expected, but I don't know what is the cause.

You managed to compile without the additional flags for OpenMP in CMakeLists.txt that I introduced (from internet sleuthing) and still a problem there. Hard to believe that OpenMP is broken on the Macs, maybe I will try this on an Intel Mac just for comparison.

Thanks for your efforts.

James Conway

biochem-fan commented 1 year ago

I didn't use -DGUI=OFF or -DFETCH_TORCH_MODELS=OFF The GUI worked ok, and I tend to use Relion that way. The torch stuff doesn't seem to make a difference.

This was only to save time.

Hard to believe that OpenMP is broken on the Macs, maybe I will try this on an Intel Mac just for comparison.

Indeed a simple program to initialize several locks and get and release them worked fine.

FilipeMaia commented 1 year ago

As a possible workaround, things seem to run using llvm from brew (in my case /opt/homebrew/Cellar/llvm/15.0.1) instead of gcc-12.

FilipeMaia commented 1 year ago

The real problem, at least in my system, seems to be multiple omp.h headers. brew's libomp, which is the llvm OpenMP library, installs omp.h in /opt/homebrew/include/omp.h. gcc installs omp.h somewhere hidden like /opt/homebrew/Cellar/gcc/12.2.0/lib/gcc/current/gcc/aarch64-apple-darwin21/12/include/omp.h. So if you have libomp installed in your system you will include its header files even when compiling with gcc (which implicitly links with libgomp). It also happens that libomp has a different sizeof(omp_lock_t) than gcc's libgomp (resulting in memory corruption and all kinds of weird behaviour).

Try to uninstall libomp and make a fresh compilation.

The following code should be able to detect such problems if placed after including omp.h

#if defined(__APPLE__) && defined(__GNUC__)
#ifndef _LIBGOMP_OMP_LOCK_DEFINED
#error "Incompatible omp.h header included! Please make sure you are not using omp.h from libomp."
#endif
#endif
biochem-fan commented 1 year ago

Great investigation!

Unfortunately, some brew packages depend on libomp, so it is not always possible to remove it. Can we somehow ask GCC to pick up the internal one?

FilipeMaia commented 1 year ago

You just need to make sure to add -I/opt/homebrew/Cellar/gcc/12.2.0/lib/gcc/current/gcc/aarch64-apple-darwin21/12/include/ (or something like that depending on the exact gcc used) before -I/opt/homebrew/include/ during the compilation step, but I don't know if there's an easy way to make that automatically in cmake.

YoshitakaMo commented 1 year ago

Hi, I saw @biochem-fan 's tweet about this issue and was curious to find out more. After installing some packages with brew install open-mpi cmake fltk fftw libomp jpeg and adding the following block into CMakeLists.txt, I successfully compiled Relion4 using only M1 Mac's Apple Clang.

# ----------------------------------------------------------------------------OpenMP--  James Conway
# This block from: https://code-examples.net/en/q/10d10e9
# Use of -Xpreprocessor suggested here: https://stackoverflow.com/questions/40095958/apple-clang-fopenmp-not-working

OPTION (USE_OpenMP "Use OpenMP to enable <omp.h>" ON)

# Find OpenMP
if(APPLE AND USE_OpenMP)
    execute_process(COMMAND brew --prefix libomp
                OUTPUT_VARIABLE OpenMP_HOME
                OUTPUT_STRIP_TRAILING_WHITESPACE)
    message(STATUS "OpenMP Root : ${OpenMP_HOME}")
    set(OpenMP_libomp_LIBRARY "${OpenMP_HOME}/lib")
    if(CMAKE_C_COMPILER_ID MATCHES "Clang")
        set(OpenMP_C "${CMAKE_C_COMPILER}")
        set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OpenMP_HOME}/include -lomp -L${OpenMP_libomp_LIBRARY}" CACHE STRING "" FORCE)
        set(OpenMP_C_LIB_NAMES "libomp")
    endif()
    if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
        set(OpenMP_CXX "${CMAKE_CXX_COMPILER}")
        set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OpenMP_HOME}/include -lomp -L${OpenMP_libomp_LIBRARY}" CACHE STRING "" FORCE)
        set(OpenMP_CXX_LIB_NAMES "libomp")
    endif()
endif()

if(USE_OpenMP)
  find_package(OpenMP REQUIRED)
endif(USE_OpenMP)

if (OpenMP_FOUND)
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
endif(OpenMP_FOUND)

Then,

mkdir -p build ; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=${HOME}/apps/relion/4.0 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang -DGUI=OFF -DFETCH_TORCH_MODELS=OFF -DCMAKE_BUILD_TYPE=DEBUG
make -j8 install

Relion 4.0 was installed on my M1 Mac.

My test was here:

$ wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion40_tutorial_precalculated_results.tar.gz
$ tar zxvf relion40_tutorial_precalculated_results.tar.gz
$ cd relion40_tutorial_precalculated_results
$ mkdir -p Class2D/vdam
$ ~/apps/relion/4.0/bin/relion_refine --o Class2D/vdam /test --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job012/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --

oversampling 1 --psi_step 12--j 1
 Running CPU instructions in double precision.
 Initial subset size set to 200
 Final subset size set to 1000
 Estimating initial noise spectra from 1000 particles
   0/   0 sec ............................................................~~(,_,">
 Estimating accuracies in the orientational assignment ...
   2/   2 sec ............................................................~~(,_,">
 Auto-refine: Estimated accuracy angles= 30.1 degrees; offsets= 14.868 Angstroms
 CurrentResolution= 56.64 Angstroms, which requires orientationSampling of at least 30 degrees for a particle of diameter 200 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 43500
 OrientationalSampling= 12 NrOrientations= 30
 TranslationalSampling= 7.08 NrTranslations= 29
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 1392000
 OrientationalSampling= 6 NrOrientations= 240
 TranslationalSampling= 3.54 NrTranslations= 116
=============================
 Gradient optimisation iteration 1 of 200 with 200 particles (Step size 0.9)
  41/  41 sec ............................................................~~(,_,">
 Maximization ...
   0/   0 sec ............................................................~~(,_,">
 CurrentResolution= 45.312 Angstroms, which requires orientationSampling of at least 25.7143 degrees for a particle of diameter 200 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 43500
 OrientationalSampling= 12 NrOrientations= 30
 TranslationalSampling= 7.08 NrTranslations= 29
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 1392000
 OrientationalSampling= 6 NrOrientations= 240
 TranslationalSampling= 3.54 NrTranslations= 116
=============================
 Gradient optimisation iteration 2 of 200 with 200 particles (Step size 0.9)
  49/  49 sec ............................................................~~(,_,">
 Maximization ...
   0/   0 sec ............................................................~~(,_,">
 CurrentResolution= 45.312 Angstroms, which requires orientationSampling of at least 25.7143 degrees for a particle of diameter 200 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 43500
 OrientationalSampling= 12 NrOrientations= 30
 TranslationalSampling= 7.08 NrTranslations= 29
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 1392000
 OrientationalSampling= 6 NrOrientations= 240
 TranslationalSampling= 3.54 NrTranslations= 116
=============================
 Gradient optimisation iteration 3 of 200 with 200 particles (Step size 0.9)
  48/  48 sec ............................................................~~(,_,">
 Maximization ...
   0/   0 sec ............................................................~~(,_,">

It seems the deadlock was solved.

FilipeMaia commented 1 year ago

@YoshitakaMo The deadlock only happens with gcc.

FilipeMaia commented 1 year ago

The following hack:

    if(APPLE AND CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
      get_property(dirs DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} PROPERTY INCLUDE_DIRECTORIES)
      list(FIND dirs "/opt/homebrew/include" index)
      if(${index} GREATER -1)
        if(EXISTS "/opt/homebrew/include/omp.h")
          # If omp.h from libomp exists in the include path and we're using gcc
          # move it to the end of the include path search list to ensure
          # we get omp.h from gcc's internal headers
          set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -idirafter /opt/homebrew/include")
          set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -idirafter /opt/homebrew/include")
        endif()   
      endif()
    endif(APPLE AND CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

when added after https://github.com/3dem/relion/blob/1569f02f26b065459c1ee3b4ea4186c228acb70e/src/apps/CMakeLists.txt#L443 fixes the issue, but it's not very pretty...

biochem-fan commented 1 year ago

I created a minimum working example of the Homebrew libomp's GCC incompatibility issue.

https://gist.github.com/biochem-fan/31864239460769d2a4a3585e4959d298

YoshitakaMo commented 1 year ago

As discussed in Homebrew-core repo, if homebrew's libomp is modified to keg-only, we can handle which openmp library should be used combined with environment variables such as C(XX)FLAGS.

biochem-fan commented 1 year ago

@YoshitakaMo I guess gcc and llvm from brew won't need CFLAGS, because their own OpenMP headers are located in their standard search path (e.g. /opt/homebrew/Cellar/gcc/12.2.0/lib/gcc/current/gcc/aarch64-apple-darwin21/12/include/omp.h, /opt/homebrew/Cellar/llvm/15.0.1/lib/clang/15.0.1/include/omp.h).

The combination of AppleClang (from Xcode) and libomp from brew needs an explicit path.

FilipeMaia commented 1 year ago

In an ideal world FindOpenMP.cmake would set OpenMP_C/CXX_INCLUDE_DIRS appropriately so we would only need to add that to include_directories, but this would definitely need to be tested (and I don't have much faith it will work "out of the box").

biochem-fan commented 1 year ago

To compile with Apple Clang + libomp from HomeBrew without patching CMakeLists.txt:

brew install libomp
OMPPATH=`brew --prefix libomp`

cmake .. -DGUI=NO -DFETCH_TORCH_MODELS=OFF \
 -DCMAKE_C_FLAGS="-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OMPPATH}/include -lomp -L${OMPPATH}/lib" \
 -DCMAKE_CXX_FLAGS="-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OMPPATH}/include -lomp -L${OMPPATH}/lib"
make -j6

Tests:

~/prog/relion/build-appleclang/bin/relion_refine --o Class2D/em --iter 25 --i Extract/job009/particles.star --dont_combine_weights_via_disc  --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 20 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 8

~/prog/relion/build-appleclang/bin/relion_refine --o Class2D/vdam --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --j 8

Both do not dead-lock.

FilipeMaia commented 1 year ago

I think this gives sufficient options to write macOS installation instructions, steering people away from gcc for the moment.

jxc100 commented 1 year ago

Thanks for all the sleuthing. I was also successful setting CC and CXX to the system clang compiler: setenv CC /usr/bin/gcc setenv CXX /usr/bin/g++ same with OMPI_CC and OMPI_CXX, and (as above): setenv OMPPATH brew --prefix libomp and set these, not sure if they are required: setenv PATH "/opt/homebrew/opt/openmpi/bin:${PATH}" setenv CXXFLAGS "-I/opt/homebrew/opt/openmpi/include" setenv LDFLAGS "-L/opt/homebrew/opt/openmpi/lib"

Then the cmake, as above, and I have tested 2D classification (VDAM) which now completes, no crashes, and the old EM method is running, now passed the previous crash.

Thanks again!

biochem-fan commented 1 year ago

You don't have to set CC, CXX, OMPI_CC, OMPI_CXX etc because gcc is the default and points to AppleClang.

YoshitakaMo commented 1 year ago

Hi all, I made a Homebrew formula according to the discussion above.

# install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)

# install Relion 4.0.0
brew install wget
wget https://raw.githubusercontent.com/YoshitakaMo/homebrew-bio/addrelion/Formula/relion.rb
brew install ./relion.rb --build-from-source --verbose --keep-tmp

After a few minutes, Relion 4.0.0 will be installed. The deadlock issue is solved on both M1 and Intel mac.

There is an small issue in applying this Formula to the Homebrew repository. Currently, a installed script relion_class_ranker.py is not executable by default. Can this issue be fixed?

$ brew audit --new /opt/homebrew/opt/relion/.brew/relion.rb

relion:
  * Non-executables were installed to "/opt/homebrew/opt/relion/bin".
    The offending files are:
      /opt/homebrew/opt/relion/bin/relion_class_ranker.py
Error: 1 problem in 1 formula detected
charlie-bond commented 1 year ago

Hi all, I made a Homebrew formula according to the discussion above.

# install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)

# install Relion 4.0.0
brew install wget
wget https://raw.githubusercontent.com/YoshitakaMo/homebrew-bio/addrelion/Formula/relion.rb
brew install ./relion.rb --build-from-source --verbose --keep-tmp

After a few minutes, Relion 4.0.0 will be installed. The deadlock issue is solved on both M1 and Intel mac.

There is an small issue in applying this Formula to the Homebrew repository. Currently, a installed script relion_class_ranker.py is not executable by default. Can this issue be fixed?

$ brew audit --new /opt/homebrew/opt/relion/.brew/relion.rb

relion:
  * Non-executables were installed to "/opt/homebrew/opt/relion/bin".
    The offending files are:
      /opt/homebrew/opt/relion/bin/relion_class_ranker.py
Error: 1 problem in 1 formula detected

I was just helping someone and noticed that this .rb file has the GUI turned off. Removing '<< "-DGUI=NO" ' from the relion.rb file allows the GUI to compile and run. Haven't tested if everything works.

YoshitakaMo commented 1 year ago

@charlie-bond Thank you for your comment. Now I've removed the arg from https://raw.githubusercontent.com/YoshitakaMo/homebrew-bio/addrelion/Formula/relion.rb to allow GUI.

YoshitakaMo commented 1 year ago

Now Relion 4.0.0 has been available on Homebrew! Just type brew install brewsci/bio/relion on your terminal if Homebrew is installed. It's not GPU-accelerated since Homebrew is not designed for Linux with GPUs, but is very useful for macOS users.