conda-forge / mpich-feedstock

A conda-smithy repository for mpich.
BSD 3-Clause "New" or "Revised" License
2 stars 26 forks source link

Fortran + OSX + mpich (3.4) issues #56

Closed hhslepicka closed 3 years ago

hhslepicka commented 3 years ago

Issue: One of my recipes started failing when building with MPICH and OSX. The previous build worked out fine so something probably changed along the way and started resulting into errors. This is the most recent build: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=270799&view=logs&j=b4588902-138a-5967-ecc7-b3fc381bfda2&t=5a7a20e7-b634-5369-ebb8-6b51f51eb32a&l=1152 The previous one that worked is: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=230160&view=results

I tracked down to be related to the new mpich == 3.4. Downgrading to 3.3.2 seems to solve the problem and produce a good build.


Environment (conda list):

Available at the CI link above.


Details about conda and system ( conda info ):

Available at the CI link above
leofang commented 3 years ago

I begin wondering if it's because when updating to 3.4 a patch was applied too aggressively...see https://github.com/conda-forge/mpich-feedstock/pull/53#issuecomment-761956468.

I just pushed a PR (#57) to limit it to OS X Arm64 builds, for which the patch was intended, but I am not sure if there is a way to verify it before it's merged...Thoughts? @conda-forge/mpich

leofang commented 3 years ago

ref: https://github.com/conda-forge/genesis2-feedstock/pull/4

hhslepicka commented 3 years ago

@leofang in theory if the patch was just for osx arm64 it should be fine to merge your PR if tests are passing.

leofang commented 3 years ago

@hhslepicka @awvwgk #57 is in. Could you try rebuilding without pinning mpich and see if it's fixed?

awvwgk commented 3 years ago

Sure can do.

Edit: Still fails: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=270964&view=logs&j=b4588902-138a-5967-ecc7-b3fc381bfda2&t=5a7a20e7-b634-5369-ebb8-6b51f51eb32a

hhslepicka commented 3 years ago

@leofang I will give it a try now and see how it goes.

hhslepicka commented 3 years ago

No deal with the unpin... https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=271074&view=logs&j=b4588902-138a-5967-ecc7-b3fc381bfda2&t=5a7a20e7-b634-5369-ebb8-6b51f51eb32a&l=1152

Should I also add mpich-mpicc to my recipe as a dependency when osx? (ref. https://github.com/conda-forge/mpich-feedstock/issues/55#issuecomment-763453553)

leofang commented 3 years ago

@awvwgk @hhslepicka I feel it's likely cmake's fault, though I can't say for certain. In your feedstocks' previous successful builds, cmake was on 3.18 but now it's 3.19. Could you try to add

- mpich-mpifort  # [osx]

to the build section alongside other compilers, and see if it works? The goal is to make cmake see the Fortran compiler wrapper (mpifort).

hhslepicka commented 3 years ago

it seems odd to be cmake's fault given that the build was successful with the old mpich (3.3.2). Even using CMake 3.19 which is the case for my latest successful build after the mpich pin.

I will try to add the mpich-mpifort as suggested.

leofang commented 3 years ago

hmmm that's a good point...🤔

btw I am reading cmake's doc, and wondering if setting a few env vars like MPI_HOME would help or not: https://cmake.org/cmake/help/latest/module/FindMPI.html

awvwgk commented 3 years ago

I found CMake 3.19 has several issues with IntelMPI recently, so this might be as well CMake related.

Edit: See https://github.com/dftbplus/dftbplus/issues/699

hhslepicka commented 3 years ago

I am trying now with the unpinned mpich and pinned cmake to 3.18. I will report back with what I can find.

hhslepicka commented 3 years ago

Ok... pinning cmake to 3.18 builds properly. Sorry for the false alarm here... I will pursue the issue with the CMake feedstock or cmake directly.

leofang commented 3 years ago

@hhslepicka but pinning at 3.18 still fails: https://github.com/conda-forge/genesis2-feedstock/pull/5/commits/284dcde332328a15164651ccf4157f2031bd499f

hhslepicka commented 3 years ago

My bad.. I looked into the wrong build 💯 x 🤦 the build still fails

awvwgk commented 3 years ago

Can confirm, its not solvable by downgrading CMake for me as well.

leofang commented 3 years ago

Thanks @hhslepicka @awvwgk for testing. I called for attention in CF's gitter channel, let's see if someone can probe the issue. I can also look into it later today, though my knowledge in cmake is fairly limited 😅

isuruf commented 3 years ago

Try updating mpich to 3.4.1. 3.4 introduced a opencl option that trips cmake and in 3.4.1 it was removed.

leofang commented 3 years ago

@isuruf @hhslepicka @awvwgk Looks like 3.4.1 doesn't solve the problem...The error is still the same:

-- Found MPI_C: $PREFIX/lib/libmpi.dylib (found version "3.1") 
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_WORKS) 
-- Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.1")
    Reason given by package: MPI component 'CXX' was requested, but language CXX is not enabled.  

CMake Error at CMakeLists.txt:46 (message):
  MPI not found, specify the MPI Fortran compiler with MPI_Fortran_COMPILER
  variable

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=271731&view=logs&j=b4588902-138a-5967-ecc7-b3fc381bfda2&t=5a7a20e7-b634-5369-ebb8-6b51f51eb32a&l=1118

leofang commented 3 years ago

btw the Open MPI Fortran builds worked fine:

-- Found MPI_C: $PREFIX/lib/libmpi.dylib (found version "3.1") 
-- Found MPI_Fortran: $PREFIX/lib/libmpi_usempif08.dylib (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=271731&view=logs&j=f781b17d-3cc8-5223-204f-29a64ca9ef56&t=4545a821-75b7-516c-9203-a809cfd0e138&l=1114

isuruf commented 3 years ago

Try adding --disable-opencl to configure script of mpich.

hhslepicka commented 3 years ago

@isuruf trying your suggestion here: https://github.com/conda-forge/mpich-feedstock/pull/59

hhslepicka commented 3 years ago

It looks like #59 did the trick for my package. @awvwgk do you mind to check yours as well and reply back so we can close this Issue?

awvwgk commented 3 years ago

59 did it. Build is passing now on OSX. Thanks a lot for your help.

leofang commented 3 years ago

Glad to know it's fixed. Thanks @isuruf @hhslepicka @awvwgk @dalcinl!

hhslepicka commented 3 years ago

Thank you all for the support! Closing this issue now.

leofang commented 3 years ago

@isuruf How did you find out the OpenCL issue? Is there an issue to link to, or should we report to upstream?