Closed bobmyhill closed 2 years ago
MPI
and will already do parallel computations. In case you have a multicore processor on your machine (essentially all modern processors have this), you can test this by setting the # of cores to 1,2 and compare the computational time (should be faster). Note that there is a little bit of overhead involved with initialising MPI
, so you best observe the effect if you compute a lot of points (say >1000). We have tested this on windows, linux and intel mac's but not yet on the M1 (so would be great if you can confirm that it works as expected). The mpiexec
path is something that you only need to set if you compile MAGEMin yourself. Once the MAGEMin
exists in the directory, the MAGEMin executable
and mpiexec path
buttons are no longer greyed out.MAGEMin
does not actually require an internet connection or sends/receives anything from the web. It may however be related to MPI
sending information around. I had a similar issue with compiled PETSc
code on my machine; that appears to have been resolved in more recent PETSc versions, so perhaps we can use the same trick.Thanks for the information. I'm familiar with MPI, less so with MATLAB. 1) Parallel computations fail with the error message at the end of this post. Sorry I didn't make the error clear in my last message. 2) I set the binary to safe in the firewall settings (which is where the warning told me to go) before raising this issue, but there's no change in behaviour after doing that. I'll chalk it up to Mac weirdness.
In the next few days I'll compile MAGEMin myself and try that version, but for due diligence as one of your reviewers I thought I should have a go with the MATLAB version.
mpiexec -n 8 MAGEMin --Verb=0 --File=MAGEMin_input.dat --n_points=650 --test=0
command =
'export PATH=/Users/rm16686/.julia/artifacts/9bfa7faf9a21863f996d8317bd5936e051971bd6/bin:/Users/rm16686/.julia/artifacts/ebaa199abbbd88d81060d398297c1aeb83b4486d/bin ; export DYLD_LIBRARY_PATH=/Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia:/Users/rm16686/.julia/artifacts/9bfa7faf9a21863f996d8317bd5936e051971bd6/lib:/Users/rm16686/.julia/artifacts/900c5f3ba53bb0d128142a78da39027c65597b0f/lib:/Users/rm16686/.julia/artifacts/bf797a6e6d1fcc01635d6b2723ac0390c82f41d2/lib:/Users/rm16686/.julia/artifacts/ebaa199abbbd88d81060d398297c1aeb83b4486d/lib:/Applications/Julia-1.7.app/Contents/Resources/julia/bin/../lib/julia:/Applications/Julia-1.7.app/Contents/Resources/julia/bin/../lib; mpiexec -n 8 MAGEMin --Verb=0 --File=MAGEMin_input.dat --n_points=650 --test=0'
No matching processes belonging to you were found
ans =
1
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16bc873ac, argv=0x16bc873a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16fb173ac, argv=0x16fb173a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16bb833ac, argv=0x16bb833a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16f6f73ac, argv=0x16f6f73a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16f1633ac, argv=0x16f1633a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16b5cb3ac, argv=0x16b5cb3a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16ae933ac, argv=0x16ae933a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)..................: MPI_Init(argc=0x16ae4f3ac, argv=0x16ae4f3a0) failed
MPII_Init_thread(209)..............:
MPID_Init(77)......................:
init_world(192)....................: channel initialization failed
MPIDI_CH3_Init(84).................:
MPID_nem_init(313).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(397):
GetSockInterfaceAddr(370)..........: gethostbyname failed, V3WV9VFXX4 (errno 0)
ForwardSimulation_Time =
0.4238
Error using sscanf
First argument must be a text scalar.
Error in ReadData_MAGEMin (line 34)
A = sscanf(line,'%f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f');
Error in PerformMAGEMin_Simulation (line 97)
[PhaseData, Status] = ReadData_MAGEMin(newPoints, PhaseData, Computation.MinPhaseFraction);
Error in ComputePhaseDiagrams_AMR (line 124)
[PhaseData, TP_vec, FailedSimulations, CancelComputation] = PerformMAGEMin_Simulation(PhaseData, newPoints, TP_vec, VerboseLevel, Chemistry, dlg, ComputeAllPoints, UseGammaEstimation, Computation);
Error in PlotPseudosection/StartNewComputation (line 1422)
[PseudoSectionData, CancelComputation] = ComputePhaseDiagrams_AMR(PseudoSectionData, DisplayPlots);
Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 427)
Error while evaluating Button PrivateButtonPushedFcn.
I am not sure about what is happening with the MPI call through Julia here.
But concerning the manual installation of MAGEMin, it should not be too hard, at least it is quite straigthforward on Linux. I don't have much experience with Mac, but the more or less tricky part 1 year ago was to get the C version of lapack
installed: lapacke
. I know that Boris could get lapacke
by simply manually installing it from the lapack
library available on netlib (http://www.netlib.org/lapack/). Moreover, from what I could read lapacke
is now included in the default lapack
package available with Brew. So hopefully you can get all the needed libraries only using Brew (namely mpich, NLopt and lapacke). Then you need to link the libraries path correctly. For this an example for a Mac system is given in the Makefile
.
If you have any problem please come back to us.
Thanks for the explanation; We really appreciate your help in checking this and we would certainly like the matlab/Julia based based version to work, as users are likely to prefer that. Unfortunately, I don't have access to a M1 system which makes debugging a bit tricky.
So the mpiexec
based code seems to fail for you, but the point-wise calculations work. This suggests that there could be a problem with the way mpiexec
& friends are compiled for the apple M1 architecture. You can try to run this directly from the julia
console (making sure that you are in the same directory as MAGEMin_input.dat
).
We had some discussions about how to combine/call mpi
with MAGEMin
this last week, which you can read here.
Could you do a few tests, to check this?
First load MAGEMin_jll
, which should be available on your system
julia> using MAGEMin_jll
Next, can you run the point wise calculations on a single CPU?
julia> run(`$(MAGEMin()) --Verb=0 --File=MAGEMin_input.dat --n_points=650 --test=0`);
I suspect that the Mac firewall message will pop up at this stage (I can look into that later). I expect that this should still work.
Next we can try to follow last weeks suggestion:
julia> const mpirun = if MAGEMin_jll.MPICH_jll.is_available()
MAGEMin_jll.MPICH_jll.mpiexec()
elseif MAGEMin_jll.MicrosoftMPI_jll.is_available()
MAGEMin_jll.MicrosoftMPI_jll.mpiexec()
else
nothing
end
after which running this in parallel should ideally be possible with:
julia> run(`$(mpirun) -n 2 $(MAGEMin()) --Verb=0 --File=MAGEMin_input.dat --n_points=650 --test=0`);
Let us know at which step it errors. From MATLAB, we don't do anything else as load the path to the required dynamic libraries and make a system call to that, so if it works from within Julia it will be possible to get this working from matlab as well.
To get back to this issue:
2. I set the binary to safe in the firewall settings (which is where the warning told me to go) before raising this issue, but there's no change in behaviour after doing that. I'll chalk it up to Mac weirdness.
I was able to reproduce this on an intel Mac, and could push a fix for it. It essentially blocks incoming traffic for the MAGEMin
binary. The fix is in the file /julia/firewall_macos.jl, which you can run from the terminal with
$julia firewall_macos.jl
Note that you do need to have the sudo
password for your machine. If that is not the case, you will have you ask your system administrator for help.
You will need to run this once for every version of MAGEMin
(if you update at some stage in the future, this will likely have to be repeated).
I rented a virtual M1 system to test this. Lessons learned:
MAGEMin
works on Apple Silicon, but is extremely slow. Almost 1 second per point, which should really be around 100-150ms or so per point (weird as the system should be faster). MPI
version working as well, but that is even slower (sometimes >60 seconds). Next, I followed the apple installation instructions in the documentation which installs NLopt
,MPICH
and LAPACKE
through HomeBrew
. That worked & to simplify this I updated the Makefile
to include the correct path's for HomeBrew.
With this, timings are as expected:
m1@6aa4e15b-9584-41b0-ab59-5a86c2cba2d8 MAGEMin-main % ./MAGEMin
Running MAGEMin 1.0.6 [18/03/2022] on 1 cores {
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
VOL_SYS +1.582647
RHO_SYS +3253.910314
MASS_RES +0.000010
Rank : 0
Point : 0
Temperature : 1100.0000 [C]
Pressure : 12.00 [kbar]
SOLUTION: [G = -825.337] (37 iterations, 51.88 ms)
[-1011.909244,-1829.091667,-819.265693,-695.468293,-412.942263,-971.879610,-876.528222,-1073.651407,-276.626131,-1380.314708,]
opx 0.23184
cpx 0.15210
spn 0.01395
ol 0.60211
Point 0
__________________________________
MAGEMin comp time: +61.462000 ms }
In parallel:
$mpiexec -n 8 ./MAGEMin --Verb=0 --File=MAGEMin_input.dat --n_points=650 --test=0
...
VOL_SYS +1.664572
RHO_SYS +3093.711575
MASS_RES +0.000005
Rank : 0
Point : 648
Temperature : 2000.0000 [C]
Pressure : 48.00 [kbar]
SOLUTION: [G = -895.546] (87 iterations, 127.23 ms)
[-1090.491405,-2032.852299,-921.753846,-746.640000,-530.391686,-1153.501918,-1014.124017,-1231.255298,-315.646614,-1544.060891,]
liq 0.99999
Point 648
__________________________________
MAGEMin comp time: +9957.603000 ms }
Same on 1 core:
$mpiexec -n 1 ./MAGEMin --Verb=0 --File=MAGEMin_input.dat --n_points=650 --test=0
...
__________________________________
MAGEMin comp time: +49174.574000 ms }
So if you have a Mac with Apple Silicon, our current recommendation is to compile MAGEMin
manually following the documentation.
Hi @boriskaus
Thanks for looking into this for me. I independently did the same thing as you (in between lectures and practicals) and got similar results both for a single core and multiple cores. The only difference is that I use openmpi, so the includes were a bit different:
LIBS = -lm -framework Accelerate /opt/homebrew/opt/lapack/lib/liblapacke.dylib /opt/homebrew/opt/nlopt/lib/libnlopt.dylib /opt/homebrew/opt/openmpi/lib/libmpi.dylib
INC = -I/opt/homebrew/opt/openmpi/include/ -I/opt/homebrew/opt/lapack/include -I/usr/local/include -I/opt/homebrew/opt/nlopt/include/
MATLAB remained unhappy until I removed the version of MAGEMin in julia, and also removed an old matlab.mat file from the root directory. Everything now appears to work, both from the command line and from MATLAB :)
I shall now play around which what looks like a very impressive solution to an age-old problem! Thanks for your help.
MATLAB remained unhappy until I removed the version of MAGEMin in julia
Hmm, if both the Julia version and a locally compiled version are present, the button should be active with which you can switch between the two versions. The Julia version is the default one in that case.
Thank you for letting us know the LIBS and INC that you used with openmpi. I am adding this to the documentation.
A couple of plausibly related issues after installing 1.0.6 on a MacBook Pro (M1). Combined here for brevity.
1) The box allowing specification of the path to mpiexec is greyed out:
2) Turning off parallel computations allows calculations to run, but every calculation is accompanied by a warning that starts ''Do you want the application “MAGEMin” to accept incoming network connections?''
My machine is behind a firewall (university rules, alas), so I can't accept incoming connections, and I can't turn these messages off (Accept and Deny don't appear to do anything to later calculations).
Any suggestions much appreciated.