Closed mmuetzel closed 3 weeks ago
It looks like it's the same as on Ubuntu until: https://github.com/ElmerCSC/elmerfem/blob/2766add0317ee0ccae8ef37e0559293ab8b44991/fem/src/MeshUtils.F90#L14034-L14035
After that, NiNj
is all zeros.
Is it an issue with MSMPI?
It looks like there is an issue with MPI_IN_PLACE
when using MSMPI. (That might very well be an upstream issue.)
The following change fixes that test for me:
From 68bfc2c31ddc3f65e25c798602b402d869b1200b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Markus=20M=C3=BCtzel?= <markus.muetzel@gmx.de>
Date: Mon, 12 Aug 2024 15:08:15 +0200
Subject: [PATCH] Avoid `MPI_IN_PLACE` in MeshUtils.F90
Fixes #521.
---
fem/src/MeshUtils.F90 | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/fem/src/MeshUtils.F90 b/fem/src/MeshUtils.F90
index 499d7874b..6bae63427 100644
--- a/fem/src/MeshUtils.F90
+++ b/fem/src/MeshUtils.F90
@@ -13922,7 +13922,7 @@ CONTAINS
REAL(KIND=dp) :: NiNj(9),A(3,3),F(3),M11,M12,M13,M14
REAL(KIND=dp) :: d1,d2,MinDist,MaxDist,Dist,X0,Y0,Rad
REAL(KIND=dp) :: Normal(3), AxisNormal(3), Tangent1(3), Tangent2(3), Coord(3), &
- CircleCoord(9)
+ CircleCoord(9), buffer(9)
INTEGER :: CircleInd(3)
LOGICAL :: BCMode, DoIt, GotNormal, GotCenter, GotRadius
INTEGER :: Tag, t1, t2
@@ -14031,7 +14031,8 @@ CONTAINS
! Only in BC mode we do currently parallel reduction.
! This could be altered too.
IF( BCMode ) THEN
- CALL MPI_ALLREDUCE(MPI_IN_PLACE,NiNj,9, &
+ buffer = NiNj;
+ CALL MPI_ALLREDUCE(buffer,NiNj,9, &
MPI_DOUBLE_PRECISION,MPI_SUM,ELMER_COMM_WORLD,ierr)
END IF
@@ -14131,7 +14132,8 @@ CONTAINS
END DO
IF( BCMode .AND. ParEnv % PEs > 1 ) THEN
- CALL MPI_ALLREDUCE(MPI_IN_PLACE,CircleCoord,6, &
+ buffer = CircleCoord;
+ CALL MPI_ALLREDUCE(buffer,CircleCoord,6, &
MPI_DOUBLE_PRECISION,MPI_MAX,ELMER_COMM_WORLD,ierr)
END IF
@@ -14190,7 +14192,8 @@ CONTAINS
END IF
IF( BCMode .AND. ParEnv % PEs > 1 ) THEN
- CALL MPI_ALLREDUCE(MPI_IN_PLACE,CircleCoord,9, &
+ buffer = CircleCoord;
+ CALL MPI_ALLREDUCE(buffer,CircleCoord,9, &
MPI_DOUBLE_PRECISION,MPI_MAX,ELMER_COMM_WORLD,ierr)
END IF
--
2.44.0.windows.1
That's probably not how this should be "fixed". Instead, it is likely that something isn't quite correct with the MSMPI headers (or import library) from MSYS2 (or MSMPI itself).
I wonder whether the value of MPI_IN_PLACE
is imported correctly with gfortran
given that there is the following code snippet in the mpif.h
of MSMPI:
COMMON /MPIPRIV1/ MPI_BOTTOM, MPI_IN_PLACE, MPI_STATUS_IGNORE
COMMON /MPIPRIV2/ MPI_STATUSES_IGNORE, MPI_ERRCODES_IGNORE
!DEC$ ATTRIBUTES DLLIMPORT :: /MPIPRIV1/, /MPIPRIV2/
COMMON /MPIFCMB5/ MPI_UNWEIGHTED
COMMON /MPIFCMB9/ MPI_WEIGHTS_EMPTY
!DEC$ ATTRIBUTES DLLIMPORT :: /MPIFCMB5/, /MPIFCMB9/
COMMON /MPIPRIVC/ MPI_ARGVS_NULL, MPI_ARGV_NULL
!DEC$ ATTRIBUTES DLLIMPORT :: /MPIPRIVC/
IIUC, !DEC$
attributes are ignored when building with gfortran. And setting attributes for common blocks doesn't work in gfortran: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47030
I haven't found a way around that.
Would it be acceptable to conditionally not use MPI_IN_PLACE
and copy the input instead? Maybe, depending on a feature test result?
But if the issue is caused by the missing DLLIMPORT
attributes, there might also be issues with the other common block members (e.g., MPI_STATUSES_IGNORE
).
Edit: Fixed copy-paste-error in patch.
I think I have changes almost ready that use MPI_IN_PLACE
only conditionally if a feature test during configuration succeeds.
However, MPI_IN_PLACE
is also used in ElmerIce for which no CI is run currently. #522 adds building ElmerIce to the CI rules. I'd like to wait until after that is merged to be a bit more certain that the changes I'd like to propose don't cause regressions.
The test
CurvedBoundaryCylHquadratic
is currently failing on Windows with the following output:Compared with the output of the same test on Ubuntu 24.04:
Notably,
CylinderFit: Axis coordinate set to be:
is1
on Windows and3
on Ubuntu.When running with gdb to a breakpoint at
MeshUtils.F90:14040
, I see the following on Ubuntu:While I see the following on Windows:
So, it looks like
NiNj
hasn't been set on Windows for some reason.Do you have any hints how to look deeper into this issue?