HiFiLES / HiFiLES-solver

High Fidelity Large Eddy Simulation Solver
Other
172 stars 131 forks source link

Can't run test cases #97

Closed jgrisham4 closed 8 years ago

jgrisham4 commented 9 years ago

Hello,

I'm trying to run some of the test cases, but I'm having some issues. I built the code successfully using MPI and ParMETIS (not the version that is distributed with the code). I tried to run a couple of the test cases, but I keep getting a segfault. Here is one line from the stderr: [c557-201.stampede.tacc.utexas.edu:mpi_rank_6][error_sighandler] Caught error: Segmentation fault (signal 11) The stdout has the following ParMETIS error: PARMETIS ERROR: The sum of tpwgts for constraint #0 is not 1.0 This error message is repeated for every processor. I figured that this might be happening because I compiled with a different version of ParMETIS than what is in the lib directory. I tried recompiling with the ParMETIS that is distributed with the code, but I still see the same error. Every seen this before? If so, any ideas what is going wrong?

Thanks,

James

JacobCrabill commented 9 years ago

Hi James,

I'm a little perplexed as well; I don't believe any of us have ever seen this before.

I managed to find a few other cases where this issue has occurred: https://bugs.launchpad.net/fluidity/+bug/1006863 http://glaros.dtc.umn.edu/gkhome/node/1119

From the looks of it, it has to do with changing the version of ParMetis being linked to without updating the API call (or something related to the ParMetis version, anyways). Are you certain that when you built the HiFiLES-supplied ParMetis, it was actually used during linking? My first thought is that, although you told HiFiLES to use the supplied ParMetis, the other version was found by the compiler/linker in one of your include paths before it found the one under HiFiLES-solver/lib.

Try this: instead of using the Autotools-generated Makefiles, use the handwritten Makefile under the folder 'makefiles' - and your own modification of one of the input files there - to build the code, specifying the location of the ParMetis and Metis libraries (both of them) which came with HiFiLES. That should directly link against the correct library files at that point, and should hopefully solve the issue. Let us know how it goes!

-Jacob

JacobCrabill commented 9 years ago

Actually, I just did a little more digging, and it looks like it's a problem with the header file, not the library:

http://osdir.com/ml/general/2013-01/msg55622.html http://libmesh-users.narkive.com/ZgE8zVTH/parmetis-troubles

So if this is the case, replace what I said before about the library files, with the header files parmetis.h and metis.h. That is, start with the handwritten Makefile, and I would suggest adding the include path to the correct headers as the very first include path. That is, right before or after line 61:

OPTS += -I include

add a line like the following:

OPTS += -I./lib/parmetis-4.0.2/include -I./lib/parmetis-4.0.2/metis/include

One of these should fix the issue. Again, let us know how it goes!

jgrisham4 commented 8 years ago

Sorry for the delayed response. I've spent some time working on the handwritten makefile, but I'm still not quite there. I'll try to finish working on this within the next week.

Thanks for your help, Jacob.

James

mlopez14 commented 8 years ago

Did the suggested solution work?

jgrisham4 commented 8 years ago

I believe that the diagnosis is correct, but I haven't gotten it working. Thanks.

jgrisham4 commented 8 years ago

Interestingly, I am having the same problem on a different cluster. For the case above, I didn't use the suggested fix. I was able to compile without it on that cluster and everything worked fine. There was a 6 month gap though so I believe some of the sys admins fixed some things (not sure what).

In this new instance, I followed the advice above about editing the handwritten makefile. I used the makefile in $HIFILES_HOME/makefiles/Makefile. Got a bunch of errors when it got to the linking stage. Turns out that several of the source files in src weren't compiled to object files and weren't being linked with the executable. I wrote my own makefile, which works. I'll attach it below. Anyways, I'm statically linking with ParMETIS and I'm making sure that headers aren't being included from anywhere else, but to no avail. Still getting the same issue.

I did more searching and found some similar issues from other sources. One suggested that the issue was because REALTYPEWIDTH in metis.h needed to be changed from 32 to 64 (i.e., float to double). I tried this, but it still didn't work. Another suggested that the issue was due to dynamically linking with the wrong version of the ParMETIS library. I know I don't have that issue because of the static linking.

Also, I've tried linking with ParMETIS which I've built manually in addition to linking with the ParMETIS which is distributed with HiFiLES.

Here are the links I mentioned:

https://answers.launchpad.net/dolfin/+question/219770 https://bugs.launchpad.net/fluidity/+bug/1006863

Any ideas?

Here is the makefile that works. Probably not best for developers because it doesn't keep track of dependencies correctly. It works though so others might find it useful.

# Assuming GNU compilers.  If Intel is used, compiler flags should be
# changed.  Also assuming ATLAS BLAS is being used.

#################################################
# Some preprocessor options
#################################################
NODE = CPU
PARALLEL = MPI

#################################################
# Libraries
#################################################

# BLAS lib and include
BLAS_INC = /share/ATLAS/build/include
BLAS_LIB = /share/ATLAS/build/lib

# location of parmetis.h
PARMETIS_INC = /share/parmetis-4.0.3/build/include

# location of libparmetis.a
PARMETIS_LIB = /share/parmetis-4.0.3/build

# location of metis.h
METIS_INC = $(PARMETIS_INC)

# location of libmetis.a
METIS_LIB = $(PARMETIS_LIB)

# location of mpi.h
MPI_INC = /share/mpich-3.2/build/include

# Libraries
# All the libraries are linked with in the HiFiLES build target below
#LIBS = -L$(BLAS_LIB) -lcblas -latlas

#################################################
# Compiler options
#################################################

CXX = mpicxx
CXXFLAGS = -O3
CPPFLAGS = -D_$(NODE) -D_$(PARALLEL) -D_STANDARD_BLAS
INCLUDE = -I./include -I$(MPI_INC) -I$(PARMETIS_INC) -I$(METIS_INC) -I$(BLAS_INC)

# Paths to source, etc
SRC = src/
OBJ = obj/
BIN = bin/
SRC_FILES = $(wildcard $(SRC)/*.cpp)
OBJ_FILES = $(addprefix $(OBJ),$(notdir $(SRC_FILES:.cpp=.o)))

#################################################
# Build instructions
#################################################

all: HiFiLES

$(OBJ)%.o: $(SRC)%.cpp
    $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(INCLUDE) -c -o $@ $<

HiFiLES: $(OBJ_FILES)
    $(CXX) $(CPPFLAGS) $(CXXFLAGS) -o $(BIN)HiFiLES $(OBJ_FILES) $(PARMETIS_LIB)/libparmetis.a $(METIS_LIB)/libmetis.a $(BLAS_LIB)/libcblas.a $(BLAS_LIB)/libatlas.a

.PHONY: clean
clean:
    rm -rf $(OBJ_FILES)
    rm -rf $(BIN)HiFiLES