Closed eazzzon closed 1 year ago
Yes, I also noticed a few days ago that it is indeed running slow on the newer apple silicon systems, if using the Julia BinaryBuilder version. It runs very fast if you compile it manually/locally, which suggests that there is a problem with the BB version. I'll see if I can find what the issue is (may take a bit, depending on my time).
On a a mac, the easiest way to install the required binaries nlopt
,mpich
,lapack
is to compile them through homebrew. First install homebrew, followed by:
$ brew install nlopt
$ brew install mpich
$ brew install lapack
Next, you will have to adapt the makefile. We just updated that to link to the default hombrew directories, but from what you show above it seems you are using an older version. Have a look at the latest version here. The makefile is a textfile, so you can comment the lines you don't need and try this again.
Hi,
Thanks, the new makefile works.
But matlab interface seems only work on refinement level 1 (very fast calculation speed) then stops with error. Any idea how to fix the mpiexec path not found error?
PS: I think I am running with the local executable as shown here. The default version is still super slow (which I believe is what you mentioned in this issue
Hi,
Here the problem is that the calculation is not performed because the path to mpiexec is likely wrong. As you can see on your screenshot: "/user/bin//mpiexec" Try to change the path to /usr/bin and not /usr/bin/
Hope this helps!
Homebrew installs mpiexec
in:
/opt/homebrew/bin
so try that.
Indeed, the default
version has a problem on Apple Silicon at the moment. I only have a silicon machine since a few days, so I am hopeful it will be resolved at some stage.
Homebrew installs
mpiexec
in:/opt/homebrew/bin
so try that. Indeed, the
default
version has a problem on Apple Silicon at the moment. I only have a silicon machine since a few days, so I am hopeful it will be resolved at some stage.
This perfectly solves the issue.
change to /usr/bin
didn't work, I guess it doesn't find mpiexec which is under homebrew
Thanks a lot for helping!
I leave it open until we resolve the issue with the BinaryBuilder version of MAGEMin being slow
A bit feedback, I am curious if the default binarybuilder being slow is because of julia? I did a loop with julia interface and it turns out takes 0.5 - 1s for one point and occasionally 2s. Could also because I didn't probably loop it wisely...
no, I think it has to do with how the binaries are compiled; it's certainly not a Julia issue. On different architectures (linux, apple intel) it works much faster. The Julia interface uses the same BinaryBuilder version as the 'default' option in the MATLAB GUI; it is therefore not a surprise that it is slow as well.
Hi
the new updated 1.3.0 seems has this issue back. but different error:
MAGEMin 1.3.0 [06/03/2023]
zsh:1: no matches found: _pseudosection_output.*.*
/opt/homebrew/bin/mpiexec -n 6 ./MAGEMin --out_matlab=0 --solver=1 --Verb=0 --sys_in=mol --db=ig --File=MAGEMin_input.dat --n_points=49 --test=0
command =
'export PATH=/Users/easonzz/.julia/artifacts/abb7cbd1c6369f566bf0334f8e033f35b639d0e6/bin:/Users/easonzz/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/bin ; export DYLD_LIBRARY_PATH=/Users/easonzz/.julia/artifacts/900c5f3ba53bb0d128142a78da39027c65597b0f/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia:/Users/easonzz/.julia/artifacts/bf797a6e6d1fcc01635d6b2723ac0390c82f41d2/lib:/Users/easonzz/.julia/artifacts/abb7cbd1c6369f566bf0334f8e033f35b639d0e6/lib:/Users/easonzz/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib/julia:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib; /opt/homebrew/bin/mpiexec -n 6 ./MAGEMin --out_matlab=0 --solver=1 --Verb=0 --sys_in=mol --db=ig --File=MAGEMin_input.dat --n_points=49 --test=0'
No matching processes belonging to you were found
ans =
1
--------------------------------------------------------------------------
The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:
plm_rsh_agent: ssh : rsh
Please either unset the parameter, or check that the path is correct
--------------------------------------------------------------------------
[MBAEZ.local:90369] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error plm_rsh_component.c(335)
export PATH=/Users/easonzz/.julia/artifacts/abb7cbd1c6369f566bf0334f8e033f35b639d0e6/bin:/Users/easonzz/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/bin ; export DYLD_LIBRARY_PATH=/Users/easonzz/.julia/artifacts/900c5f3ba53bb0d128142a78da39027c65597b0f/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia:/Users/easonzz/.julia/artifacts/bf797a6e6d1fcc01635d6b2723ac0390c82f41d2/lib:/Users/easonzz/.julia/artifacts/abb7cbd1c6369f566bf0334f8e033f35b639d0e6/lib:/Users/easonzz/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib/julia:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib; /opt/homebrew/bin/mpiexec -n 6 ./MAGEMin --out_matlab=0 --solver=1 --Verb=0 --sys_in=mol --db=ig --File=MAGEMin_input.dat --n_points=49 --test=0: Signal 115
ForwardSimulation_Time =
0.2515
Error using sscanf
First argument must be a text scalar.
Error in ReadPseudoSectionData_MAGEMin (line 34)
A = sscanf(line,'%f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f');
Error in PerformMAGEMin_Simulation (line 100)
[PhaseData, Status] = ReadPseudoSectionData_MAGEMin(newPoints, PhaseData,
Computation.MinPhaseFraction);
Error in ComputePhaseDiagrams_AMR (line 123)
[PhaseData, TP_vec, FailedSimulations, CancelComputation] =
PerformMAGEMin_Simulation(PhaseData, newPoints, TP_vec, VerboseLevel, Chemistry, dlg,
ComputeAllPoints, UseGammaEstimation, Computation);
Error in PlotPseudosection/StartNewComputation (line 1533)
[PseudoSectionData, CancelComputation] =
ComputePhaseDiagrams_AMR(PseudoSectionData, DisplayPlots);
Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 386)
Error while evaluating Button PrivateButtonPushedFcn.
The new self-complied 1.3.0 works
Below is my GUI settings for local paraller calculation:
default path /usr/bin
also didn't work
Any idea what might cause the issue?
It seems to me that it is trying to run with the binary builder version while you are giving a local path for mpi. If I recall installing MAGEMin with the GUI creates an environment variable file, the conflict may come from there. How did you install the last version?
Did you compile MAGEMin yourself? Or did you install it with the binary builder?
Hi,
Both previous version (1.2.8) and 1.3.0 I installed with the MATLAB GUI, yes there is an environmental variable .m
file created after the installation. I then complied MAGEMin myself to enable a local paraller model. it worked with 1.2.8 but doesn't work with 1.3.0.
here are what in my environmental variable file if that is useful?
path_dylib = '/Users/easonzz/.julia/artifacts/900c5f3ba53bb0d128142a78da39027c65597b0f/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia:/Users/easonzz/.julia/artifacts/bf797a6e6d1fcc01635d6b2723ac0390c82f41d2/lib:/Users/easonzz/.julia/artifacts/abb7cbd1c6369f566bf0334f8e033f35b639d0e6/lib:/Users/easonzz/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib/julia:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib';
path_bin = '/Users/easonzz/.julia/artifacts/abb7cbd1c6369f566bf0334f8e033f35b639d0e6/bin:/Users/easonzz/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/bin';
path_julia = '/Applications/Julia-1.8.app/Contents/Resources/julia/bin'; ```
What happens if you try to run that command in a terminal from the MAGEMin directory (make sure that the MAGEMin_input.dat exists, if not try first generating with the GUI until it crashes):
/opt/homebrew/bin/mpiexec -n 6 ./MAGEMin --out_matlab=0 --solver=1 --Verb=0 --sys_in=mol --db=ig --File=MAGEMin_input.dat --n_points=49 --test=0
Hi, it works well in this way with output.
Looks like this is a miscommunicate with MATLAB GUI and the mpi?
On a related note, is there a way to save the results from the terminal like this? I guess I would need the GUI to generate a dat file first?
So the problem is that the GUI is loading the environment variables. Try to delete the .m file then, and the GUI should work without problems with the local MPI.
amazing, fixed!
So the problem is that the GUI is loading the environment variables. Try to delete the .m file then, and the GUI should work without problems with the local MPI.
ok, I am happy to report that we finally fixed the issue on Apple Silicon with the automatically installed MAGEMin version (in version 1.3.1). So you no longer need to compile the code yourself (make sure you do this with File > Install MAGEMin
in the GUI).
It runs at essentially the same speed a when compiling this manually: Before:
julia> using MAGEMin_jll
julia> run(`$(MAGEMin_jll.MAGEMin())`)
Running MAGEMin 1.2.7 [22/09/2022] on 1 cores {
═══════════════════════════════════════════════
Status : 0
Mass residual : +7.90944e-06
Rank : 0
Point : 0
Temperature : +1100.00000 [C]
Pressure : +12.00000 [kbar]
SOL = [G: -825.338] (35 iterations, 2109.93 ms)
GAM = [-1011.909272,-1829.092209,-819.265216,-695.468666,-412.938858,-971.870791,-876.535530,-1073.647034,-276.622011,-1380.309499]
Phase : opx spn ol cpx
Mode : 0.23186 0.01393 0.60213 0.15208
___________________________________
MAGEMin comp time: +2305.751000 ms }
After:
julia> using MAGEMin_jll
julia> run(`$(MAGEMin_jll.MAGEMin())`)
Running MAGEMin 1.3.1 [03/04/2023] on 1 cores {
═══════════════════════════════════════════════
Status : 0
Mass residual : +5.13017e-06
Rank : 0
Point : 0
Temperature : +1100.00000 [C]
Pressure : +12.00000 [kbar]
SOL = [G: -825.337] (34 iterations, 38.21 ms)
GAM = [-1011.909615,-1829.092317,-819.264025,-695.467466,-412.947646,-971.889493,-876.545698,-1073.639033,-276.591254,-1380.299192]
Phase : opx cpx ol spn
Mode : 0.23189 0.15205 0.60213 0.01393
___________________________________
MAGEMin comp time: +42.925000 ms }
Issue
If you are interested in what happened: the issue had to do with how LAPACK/BLAS
was linked where the multithreading seemed to have interfered with the MPI build). We solved this by changing MAGEMin
to use the Apple Accelerate
framework (which includes optimised versions of LAPACK), rather than relying on our own compiled versions. This removed one external dependency and should also take care of future hardware improvements (as long as apple adapts their libraries accordingly).
Profiling the code
This was discovered while profiling the code on an Apple Silicon machine with XCode
and the command-line tools
installed. For completion, here the steps done to do this:
Run MAGEMin
for 100 points (any input file will do). This example is for the manually compiled MAGEMin
version:
$ xcrun xctrace record --template "Time Profiler" --launch /Users/kausb/WORK/MAGEMin/MAGEMin -- --File=/Users/kausb/WORK/MAGEMin/MAGEMin_input.dat --n_points=100
Starting recording with the Time Profiler template. Launching process: MAGEMin.
Ctrl-C to stop the recording
Target app exited, ending recording...
Recording completed. Saving output file...
Output file saved as: Launch_MAGEMin_2023-04-04_10.55.04_4C23A589.trace
If you want to do the same with the BinaryBuilder version of MAGEMin
, you need to add the correct dynamic libraries as well:
$xcrun xctrace record --template "Time Profiler" -e DYLD_FALLBACK_LIBRARY_PATH=/Users/kausb/.julia/artifacts/900c5f3ba53bb0d128142a78da39027c65597b0f/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia:/Users/kausb/.julia/artifacts/bf797a6e6d1fcc01635d6b2723ac0390c82f41d2/lib:/Users/kausb/.julia/artifacts/abb7cbd1c6369f566bf0334f8e033f35b639d0e6/lib:/Users/kausb/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/lib:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib/julia:/Applications/Julia-1.8.app/Contents/Resources/julia/bin/../lib:/Users/kausb/lib:/usr/local/lib:/lib:/usr/lib --launch /Users/kausb/.julia/artifacts/5ead90ea92128f3bba70df07a389c372594e09db/bin/MAGEMin -- --File=/Users/kausb/WORK/MAGEMin/MAGEMin_input.dat --n_points=100
Open the trace file:
$ open Launch_MAGEMin_2023-04-04_10.55.04_4C23A589.trace
This will open the Instruments
app and will allow you to see where the time is spend:
for the current version of MAGEMin
, 59% of the time is spend in NLopt
routines.
thank you Boris, it works great now
Hi
I am trying to run the parallel computation with the matlab interface but runs very slow, actually one core model runs even faster.
I followed this issue, and have complied with homebrew for
NLopt
,MPICH
andLAPACK
, then makefile but got an error below:Not very familiar with makefiles, any idea of how to make this work? This might be a beginner's issue..Thanks in advance!