Closed andrea-franceschini closed 4 years ago
@af1990 I've pushed branch bugfix/TotoGaz/petscLink to get the information on all targets.
@TotoGaz I've just tried it, obtaining the same error. Do you need some specific output?
Just wanted to get an overview of the problem on all our targets. It should be general to all platforms, but I prefer to check it before investigating.
Actually, I could not reproduce on my env (ubuntu & gcc8 from our travis-ci) and the jobs from travis-ci https://travis-ci.com/github/GEOSX/GEOSX/builds/153224769 look OK
Please check that I really activated Petsc? (but I think it is OK)
Petsc is there! I see ... so it works on Travis! As far as I know, I'm the only one experiencing this issue ... What am I doing wrong?
Sure you are using the last/proper TPLs when building GEOSX ?
I pulled the latest TPLs version yesterday and compiled them in the usual way.
BTW, it's since a long time I'm facing this issue, but I always moved on. But, now I need PETSc ...
I pulled the latest TPLs version yesterday and compiled them in the usual way.
Can you give your command line?
python scripts/config-build.py -hc host-configs/environment.cmake -bt Release
, then make
@af1990 What's about
nm GEOSX_DIR/thirdPartyLibs/install-environment-release/petsc/lib/libpetsc.so | grep SCOTCH_ParMETIS_V3_NodeND
It's undefined!
It's undefined!
On travis containers:
root@section-paloise:/# nm /opt/GEOSX_TPL/petsc/lib/libpetsc.so.3.10.2 | grep SCOTCH_ParMETIS_V3_NodeND
0000000000fd07b0 T SCOTCH_ParMETIS_V3_NodeND
0000000000fd06e0 t _SCOTCH_ParMETIS_V3_NodeNDTree
So your petsc seems the place to investigate.
Maybe ... but I've already tried to look into it ...
The only thing I can think of is that PETSc
is somehow linking to a shared ptscotch
library installed somewhere in the system paths (instead of the one that it "downloads" from tplMirror
), but that dependency of course does not get imported into GEOSX. Can you search through the package manager for anything that could've installed ptscotch
system-wide and try to remove it?
I know that I have a system version of ptscotch
but I cannot remove it ...
That's the complete log from PETSc
: make.log.
#define PETSC_HAVE_SCOTCH_PARMETIS_V3_NODEND 1
I guess something is taunting you!
define PETSC_HAVE_SCOTCH_PARMETIS_V3_NODEND 1
I think it's needed ... From petsc/src/petsc/src/mat/partition/impls/scotch/scotch.c
:
#if defined(PETSC_HAVE_SCOTCH_PARMETIS_V3_NODEND)
PetscInt *sizes, *seps, log2size, subd, *level, base = 0;
PetscMPIInt size;
ierr = MPI_Comm_size(comm,&size);CHKERRQ(ierr);
log2size = PetscLog2Real(size);
subd = PetscPowInt(2,log2size);
if (subd != size) SETERRQ(comm,PETSC_ERR_SUP,"Only power of 2 communicator sizes");
ierr = PetscMalloc1(mat->rmap->n,&NDorder);CHKERRQ(ierr);
ierr = PetscMalloc3(2*size,&sizes,4*size,&seps,size,&level);CHKERRQ(ierr);
SCOTCH_ParMETIS_V3_NodeND(mat->rmap->range,adj->i,adj->j,&base,NULL,NDorder,sizes,&comm);
ierr = MatPartitioningSizesToSep_Private(subd,sizes,seps,level);CHKERRQ(ierr);
for (i=0;i<mat->rmap->n;i++) {
PetscInt loc;
ierr = PetscFindInt(NDorder[i],2*subd,seps,&loc);CHKERRQ(ierr);
if (loc < 0) {
loc = -(loc+1);
if (loc%2) { /* part of subdomain */
locals[i] = loc/2;
} else {
ierr = PetscFindInt(NDorder[i],2*(subd-1),seps+2*subd,&loc);CHKERRQ(ierr);
loc = loc < 0 ? -(loc+1)/2 : loc/2;
locals[i] = level[loc];
}
} else locals[i] = loc/2;
}
ierr = PetscFree3(sizes,seps,level);CHKERRQ(ierr);
#else
SETERRQ(pcomm,PETSC_ERR_SUP,"Need libptscotchparmetis.a compiled with -DSCOTCH_METIS_PREFIX");
#endif
At least ... I cannot disable it 🤷♂
Digging into the log file from PETSc
, I realized that the linking command is:
Focusing only on libraries, we have
From man ld
, we have:
-L searchdir
--library-path=searchdir
Add path searchdir to the list of paths that ld will search for archive libraries
and ld control scripts. You may use this option any number of times. The
directories are searched in the order in which they are specified on the command
line. Directories specified on the command line are searched before the default
directories. All -L options apply to all -l options, regardless of the order in which
the options appear. -L options do not affect how ld searches for a linker script
unless -T option is specified.
This means that the PETSc
approach of dividing the paths (-L
) from the libraries (-l
) can be misleading in our case, where we don't work with system libraries. Or, at least, we need to take care of the order used when listing the different paths. In particular, in my case, the third provided path is /usr/lib/x86_64-linux-gnu
(there because of openblas.so
) and it is containing both metis
and scotch
. Thus, ld
will not use the proper paths for metis
and scotch
. Indeed, from the ld
report, we have:
with a focus on the problem, we have:
Thus, if this script is run on a system used by someone having any of the dependencies in a system path, it will systematically fail.
The order of the library paths (-L
) is a consequence of the alphabetic order of the dependecies used in the internal python script that configures PETSc
. I don't think it's possible (and portable) to change it.
A possible different solution would be to postprocess in a different way the same pieces of information. From the format -L/path/to/lib/folder
and -lname
to /path/to/lib/folder/libname
. I don't think there is an high level solution to do so, but you need to change the source code of the python configure script. It's a very limited change (more or less 10 lines of code), but this will make our PETSc
version different from the official one and moreover I'm not sure the patch will cover all possible cases.
After the change, the linking command line (part regarding libs) is:
and the ld
report is:
Now both metis
and scotch
are the right ones! And I don't have anymore the undefined reference to SCOTCH_ParMETIS_V3_NodeND
. 😃
I can understand this kind of changes are very deep, not easily portable and error-prone, nevertheless, I need it if I want to use PETSc
. I think it is quite easy that in the future someone with an already installed version of any PETSc
dependencies will run into the same problem. To prevent this we should at least document the problem. I don't know if this is in the GEOSX
spirit, but @rrsettgast, @TotoGaz, @klevzoff please le know what you think about this problem and the possible solution of incorporating the patch in the tplMirror
. Thanks!
The changed piece of code is:
Great investigation @af1990 So IIUC Petsc fails at his linking if there is another dependency that interacts or conflicts in the system. Probably on should inform the Pestc project for an upstream patch?
For us, if we want to, we can use the PATCH_COMMAND
of https://cmake.org/cmake/help/latest/module/ExternalProject.html. We shall test a little more to be sure we do not break everything :) (There is already an example in our https://github.com/GEOSX/thirdPartyLibs/blob/master/CMakeLists.txt)
For the python code, you have something like
if (len(libListSplit) > 1):
...
else:
if (len(libListSplit) > 0):
...
Reading the code you seem to have two cases: len(libListSplit) == 2
and len(libListSplit) == 1
. Maybe you could use a
if len(libListSplit) == 1:
...
elif len(libListSplit) == 2:
...
else:
raise ValueError("Lib bla bla")
Also, libPath+'/'+str(os.path.basename(j))
shall be replaced by os.path.join(libPath, os.path.basename(j))
Thanks for the improvement in the python code!
What is the status here? Is this resolved?
What is the status here? Is this resolved?
I'm sorry @joshua-white, but I wouldn't say so. On my machine, every time I compile the TPLs, I need to stop the build when it approaches PETSc, change the an internal PETSc configuration file (and I know that the patch will work only a very limited subset of cases ... but this includes my system) and run the main make
again.
Hi there! IIUC, this is not a GEOSX
/TPL
issue, more a Petsc
one.
I would close this and open case on Petsc
's bug tracker instead.
I'm closing this issue for the moment and adding it to the general stack of "we need more help getting Petsc working properly."
Describe the bug
GEOSX
is not compilable withPETSc
. The error appears when linking the main executable:To Reproduce Compile with
ENABLE_PETSC=ON
.Expected behavior Should be able to link.
Platform (please complete the following information):
Additional context TPLs are up to date. Moreover, I have:
GEOSX_DIR/thirdPartyLibs/install-environment-release/petsc/lib $ ls -1
``` libesmumps.a libpetsc.so libpetsc.so.3.10 libpetsc.so.3.10.2 libptesmumps.a libptscotch.a libptscotcherr.a libptscotcherrexit.a libptscotchparmetis.a libscotch.a libscotcherr.a libscotcherrexit.a libscotchmetis.a petsc pkgconfig ```and
nm libptscotchparmetis.a
So it seems that
SCOTCH_ParMETIS_V3_NodeND
is properly defined inlibptscotchparmetis.a
and this is used in thePETSc
linking phase.From
configure.log
inPETSc
build folder, I have: