conda-forge / mpi4py-feedstock

A conda-smithy repository for mpi4py.
BSD 3-Clause "New" or "Revised" License
4 stars 20 forks source link

Issues running mpi for osx-64 code on M1 Mac #60

Closed ithaca-isr closed 1 year ago

ithaca-isr commented 1 year ago

Solution to issue cannot be found in the documentation.

Issue

I am having difficulty running mpi codes written in osx-64 architecture on my M1 Mac. I configured my Conda environment using the command below, but it made no difference: conda config --env --set subdir osx-64

Here is the simple test code I tried to run: comm.py which I pasted below:

from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank() print('My rank is ',rank)

and to run it I typed: mpiexec -np 2 python comm.py

I get the following errors: Fatal error in internal_Init_thread: Other MPI error, error stack: internal_Init_thread(60)...........: MPI_Init_thread(argc=0x0, argv=0x0, required=3, provided=0x30b632f00) failed MPII_Init_thread(209)..............: MPID_Init(75)......................: init_world(190)....................: channel initialization failed MPIDI_CH3_Init(84).................: MPID_nem_init(313).................: MPID_nem_tcp_init(175).............: MPID_nem_tcp_get_business_card(397): GetSockInterfaceAddr(370)..........: gethostbyname failed, C164CWQTWK.tld (errno 0) Fatal error in internal_Init_thread: Other MPI error, error stack: internal_Init_thread(60)...........: MPI_Init_thread(argc=0x0, argv=0x0, required=3, provided=0x30698ef00) failed MPII_Init_thread(209)..............: MPID_Init(75)......................: init_world(190)....................: channel initialization failed MPIDI_CH3_Init(84).................: MPID_nem_init(313).................: MPID_nem_tcp_init(175).............: MPID_nem_tcp_get_business_card(397): GetSockInterfaceAddr(370)..........: gethostbyname failed, C164CWQTWK.tld (errno 0)

Installed packages

# Name                    Version                   Build  Channel
bzip2                     1.0.8                h0d85af4_4    conda-forge
ca-certificates           2022.9.24            h033912b_0    conda-forge
libcxx                    14.0.6               hccf4f1f_0    conda-forge
libffi                    3.4.2                h0d85af4_5    conda-forge
libgfortran               5.0.0           10_4_0_h97931a8_25    conda-forge
libgfortran5              11.3.0              h082f757_25    conda-forge
libsqlite                 3.39.4               ha978bb4_0    conda-forge
libzlib                   1.2.13               hfd90126_4    conda-forge
llvm-openmp               14.0.4               ha654fa7_0    conda-forge
mpi                       1.0                       mpich    conda-forge
mpi4py                    3.1.4           py311hbdc7f45_0    conda-forge
mpich                     4.0.2              hd33e60e_100    conda-forge
ncurses                   6.3                  h96cf925_1    conda-forge
openssl                   3.0.7                hfd90126_0    conda-forge
pip                       22.3               pyhd8ed1ab_0    conda-forge
python                    3.11.0          h559f36b_0_cpython    conda-forge
python_abi                3.11                    2_cp311    conda-forge
readline                  8.1.2                h3899abd_0    conda-forge
setuptools                65.5.0             pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h5dbffcc_0    conda-forge
tzdata                    2022f                h191b570_0    conda-forge
wheel                     0.38.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h775f41a_0    conda-forge

Environment info

active environment : testenv3
    active env location : /Users/sdergh28/opt/anaconda3/envs/testenv3
            shell level : 1
       user config file : /Users/sdergh28/.condarc
 populated config files : /Users/sdergh28/opt/anaconda3/.condarc
                          /Users/sdergh28/.condarc
                          /Users/sdergh28/opt/anaconda3/envs/testenv3/.condarc
          conda version : 22.9.0
    conda-build version : 3.22.0
         python version : 3.9.13.final.0
       virtual packages : __osx=12.6.1=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/sdergh28/opt/anaconda3  (writable)
      conda av data dir : /Users/sdergh28/opt/anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/sdergh28/opt/anaconda3/pkgs
                          /Users/sdergh28/.conda/pkgs
       envs directories : /Users/sdergh28/opt/anaconda3/envs
                          /Users/sdergh28/.conda/envs
               platform : osx-64
             user-agent : conda/22.9.0 requests/2.28.1 CPython/3.9.13 Darwin/21.6.0 OSX/12.6.1
                UID:GID : 504:20
             netrc file : None
           offline mode : False
dalcinl commented 1 year ago

conda config --env --set subdir osx-64

Why are you using osx-64 ? I believe you be using osx-arm64 on Apple M1, don't change that.

The issue seems to happen when MPICH calls gethostbyname(). This likely means some network misconfiguration on your machine. What's the output of cat /etc/hosts? Perhaps you can fix your issue by running this command. If you do, make a backup copy of /etc/hosts first.

dalcinl commented 1 year ago

I cannot reproduce the issue using the latest Mambaforge installer and a Python 3.11 + MPICH + mpi4py environment.

ithaca-isr commented 1 year ago

I am using the osx-64 for some packages that I will install because they do not work on arm64. I do not have experience with the hosts file. I made a backup already, but are you sure it won't interfere with any running process?

In the file it says: # Host Database

localhost is used to configure the loopback interface when the system is booting. Do not change this entry.

There are three addresses, one for localhost, another one for broadcast host and another one for localhost.

dalcinl commented 1 year ago

If you are not sure about the consequences of changing /etc/hosts, then don't do it. I never touched my /etc/hosts, it is like yours, and things work on my system.

What's the output of running the command hostname ?

ithaca-isr commented 1 year ago

ok. It gives me the name of a .tld file

dalcinl commented 1 year ago

OK, so this definitely looks like a problem in your network configuration.

Quick tip: Set the following environment variable in the command line:

export HYDRA_IFACE=lo0

and try running again. This should force mpiexec to use the loopback interface, that is, the 127.0.0.1 localhost entry on your /etc/hosts and sidestep the issue of your machine not having a permanent hostname.

If that does not work, then I believe the only option is to set the hostname of your machine.

  1. You have to set a proper hostname, eg. ("MyMac" is just an example, use whatever name you like)
    sudo sudo scutil --set LocalHostName MyMac
    sudo dscacheutil -flushcache

    and then reboot your computer.

  2. Run again hostname and verify it is "MyMac".
  3. You may still need to run
    echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts > /dev/null

    Run all these steps at your own risk, I'm not an expert on macOS network stack, I don't really know the consequences of these changes and how they can interact with other stuff in the macOS ecosystem. If you do not feel comfortable doing all these changes, it is time to ask MPICH developers about this issue.

ithaca-isr commented 1 year ago

thank you it worked! Luckily, I did not need to do Step 3.