conda-forge / mpi4py-feedstock

A conda-smithy repository for mpi4py.
BSD 3-Clause "New" or "Revised" License
4 stars 20 forks source link

Support MS-MPI #33

Closed leofang closed 3 years ago

leofang commented 3 years ago

Checklist

I know nothing on Windows and does not use it at all, just curious how far we can go with the new msmpi feedstock.

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

leofang commented 3 years ago

@conda-forge-admin, please rerender

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

dalcinl commented 3 years ago

@leofang It is not that easy. mpi4py is prepared to build with a proper MS-MPI install, and not the binaries from the conda-forge package.

I think the proper fix is to export the following environment variables before pip install ...:

set MSMPI_INC=%BUILD_PREFIX%\Library\include
set MSMPI_LIB=%BUILD_PREFIX%\Library\lib

Edit: You may need to scape backslashes

dalcinl commented 3 years ago

@leofang Or maybe patch conf/mpidistutils.py special-casing CONDA_BUILD in the environment.

leofang commented 3 years ago

@dalcinl Thanks! It actually built fine, just failed when testing because python setup.py ... would make it search the wrong Python lib location. I am trying 123ec3e as an alternative way to patch.

leofang commented 3 years ago

I think the proper fix is to export the following environment variables before pip install ...:

Yeah I think it's equivalent to 123ec3e!!! Let's give it a shot 🤞

leofang commented 3 years ago

@dalcinl You're right...But then I don't understand why it built in 11a8f39...😂

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

leofang commented 3 years ago

@dalcinl CCP_HOME works! 🎉 Let me clean things up.

leofang commented 3 years ago

cc: @RyanMcCarthy-NOAA Thank you for packaging msmpi! It seems to work fine with mpi4py. Would be great if you could give it a try 🙂

leofang commented 3 years ago

@dalcinl @conda-forge/core Question: Is it necessary to add msmpi to mpi-feedstock? It seems to me that the mutual exclusive lock is gained for free as msmpi only lives on Windows, different from mpich or openmpi.

isuruf commented 3 years ago

Yes, otherwise one day when there's openmpi or other impl, we will have to hotfix package which we want to avoid if possible.

leofang commented 3 years ago

Yes, otherwise one day when there's openmpi or other impl, we will have to hotfix package which we want to avoid if possible.

Thanks, @isuruf. I seriously think it's highly impossible, but I sent https://github.com/conda-forge/mpi-feedstock/pull/7 anyway.

leofang commented 3 years ago

We need to first merge https://github.com/conda-forge/mpi-feedstock/pull/7 -> https://github.com/conda-forge/msmpi-feedstock/pull/1 -> this PR.

leofang commented 3 years ago

@dalcinl Any chance you know what's wrong with MSMPI? The CI raises this error when running the test script added in f5ae831:

(%PREFIX%) %SRC_DIR%>mpiexec -n 2 python mpi4py_simple_test.py 
ERROR: Failed to post close command error 1726
ERROR: unable to tear down the job tree. exiting...
dalcinl commented 3 years ago

@leofang Perhaps you need to start the smpd daemon before running?

smpd -debug

I have no idea, really. I never used the new open-sourced MS-MPI, all my experience with it was using the binary installers downloaded from Microsoft. Unfortunately, the msmpi conda-forge package recipe do not have even a trivial test making sure the packaged binaries work for both linking and running MPI programs.

PS: There is no need to add yet another file with tests. Just use the following lines for testing:

mpiexec -n 2 python -m mpi4py.bench helloworld
mpiexec -n 2 python -m mpi4py.bench ringtest
conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

leofang commented 3 years ago

@RyanMcCarthy-NOAA @isuruf I feel msmpi isn't correctly built, or something wrong is picked up. See the following output generated by mpiexec -d 3 ... (3 is the maximum debug level)

...
[00:5320] successfully launched process 1
[00:5320] successfully launched process 0
[00:5320] root process launched, starting stdin redirection.
[01:2068] ERROR: Process Management Protocol version mismatch in connection
request: received version 3, expected 4.
[01:2068] ERROR: Process Management Protocol version mismatch in connection
request: received version 3, expected 4.
[01:2068] process_id=1 rank=0 refcount=2, waiting for the process to finish exiting.
[01:2068] creating an exit command for process id=1  rank=0, pid=9128, exit code=-1.
...

SMPD_PMP_VERSION is defined as 4 in the source code, yes, but for some reason 3 is picked up.

@RyanMcCarthy-NOAA were you able to run your applications using the msmpi package?

mccarthyryanc commented 3 years ago

@RyanMcCarthy-NOAA were you able to run your applications using the msmpi package?

I originally needed msmpi for win builds of taudem. I just created a new env with msmpi and taudem and ran one of the example commands from the docs:

mpiexec -n 8 -d 3 PitRemove logan.tif

and got the expected output:

....
[01:784] env: PMI_RANK=0
[01:784] env: PMI_SMPD_KEY=7
[01:784] version check complete, using PMP version 4.
[01:784] forwarding command SMPD_INIT src=1 ctx_key=0
....

It seems that msmpi is working properly for at least that package.

leofang commented 3 years ago

Thanks for checking, @mccarthyryanc.

version check complete, using PMP version 4

This was also shown in the failed CI build, before the error happened. I assumed by working you meant no error is generated and the result is correct.

Could you do one more check for me? Which mpiexec was in use?

where mpiexec
leofang commented 3 years ago

Another possibility is it seems msmpi is also included in the docker image (C:\Program Files\Microsoft MPI), which may have an incompatible version.

mccarthyryanc commented 3 years ago

This was also shown in the failed CI build, before the error happened. I assumed by working you meant no error is generated and the result is correct.

Yes, command completed successfully and output data was correct.

mpiexec was in the local env I created at <env-base>\Library\bin\mpiexec.exe. Checkout the conda build docs about the LIBRARY_BIN env variable.

leofang commented 3 years ago

mpiexec was in the local env I created at <env-base>\Library\bin\mpiexec.exe. Checkout the conda build docs about the LIBRARY_BIN env variable.

Thanks for confirming @mccarthyryanc. Yes I am aware of LIBRARY_BIN. I am in the middle of confirming my theory, as I found the Windows docker image has an outdated MPI that might be causing the bad interference:

...
 Directory of C:\Program Files\Microsoft MPI\Bin

12/10/2020  05:15 PM    <DIR>          .
12/10/2020  05:15 PM    <DIR>          ..
06/13/2016  12:38 PM           284,368 mpiexec.exe
06/13/2016  12:25 PM           989,201 mpitrace.man
06/13/2016  12:37 PM            27,760 msmpilaunchsvc.exe
06/13/2016  12:38 PM           237,776 smpd.exe
               4 File(s)      1,539,105 bytes
               2 Dir(s)  84,183,883,776 bytes free
Microsoft MPI Startup Program [Version 7.1.12437.25]   <---
...
leofang commented 3 years ago

@dalcinl @mccarthyryanc

I am in the middle of confirming my theory, as I found the Windows docker image has an outdated MPI that might be causing the bad interference

My theory was wrong. It didn't help after cleanup. In fact, Conda/MS-MPI did pick up the correct executables. The error when running mpiexec -n 2 python ... seems to be a bad interference with Azure's infrastructure. I was able to download the build artifact (which was enabled by adding this config

azure:
   store_build_artifacts: true

to conda-forge.yml) and verified on a Windows virtual machine that the resulting package is working, so what I'm going to do is to clean up the recipe and disable testing on Windows. Then this PR will be ready to go (note that all import tests also pass on Windows, so it should be enough).

leofang commented 3 years ago

By the way,

Perhaps you need to start the smpd daemon before running?

smpd -debug

It turns out not necessary. When launching mpiexec it will spin up a smpd manager. In fact, the manually launched smpd (via start /B smpd -d 3) is ignored by mpiexec. This aligns with mpi4py's tests: there we don't worry about smpd either.

leofang commented 3 years ago

We need to first merge conda-forge/mpi-feedstock#7 -> conda-forge/msmpi-feedstock#1 -> this PR.

@dalcinl Both PRs are in and all checks have passed! This is ready now 😁

dalcinl commented 3 years ago

@leofang Any chance you can fixup all commits into a single one (or two, too keep recipe rerenders separate)? Or should we just "Squash and merge"? I usually care about having a clean history without dirty intermediate steps.

leofang commented 3 years ago

For feedstocks I care much less because the commit history is already dirty (due to bots, migrations, etc). Feel free to squash if you like 🙂

dalcinl commented 3 years ago

For feedstocks I care much less because the commit history is already dirty (due to bots, migrations, etc)

Yes, but most of the recipe contentes and all of the code in build scripts is not touched by bots and migrations. OK, I'll squash and merge.

leofang commented 3 years ago

Thanks, @dalcinl. I noticed the merge commit has 1 connection failure: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=260246&view=logs&j=e5cdccbf-4751-5a24-7406-185c9d30d021 Are you able to restart that test? Without all tests passing it seems CF won't add me to the maintainer list...

leofang commented 3 years ago

It's restarted!

leofang commented 3 years ago

btw Forgot to say: Thank you @dalcinl @mccarthyryanc @isuruf for your help with my little Windows adventure! 😂