Installation problems - Githubissues

QiangLiu66 commented 4 years ago

First thanks for sharing the code. During the installation process, the configuration silo-4.10.2 file using "CC=$MPI_DIR//bin/mpicc CXX=$MPI_DIR/bin/mpicxx CXXFLAGS="-fPIC -O3 -std=c++14" ./configure --prefix=$LBPM_SILO_DIR -with-hdf5=$LBPM_HDF5_DIR/include,$LBPM_HDF5_DIR/lib --enable-static && make && make install" was not successful. Could you give me some guidance? Thank you very much. @JamesEMcClure @thomaram

JamesEMcClure commented 4 years ago

Hi Qiang,

Are you following the instructions on the Wiki below (See step 4):

https://github.com/OPM/LBPM/wiki/LBPM-Tutorial,-Step-0.-Building-LBPM

Also note that there are some example scripts within the sample_scripts directory.

I can give you better guidance if you can give me a bit more information regarding the system where you are building and the specific error message from the cmake configure line.

JamesEMcClure commented 4 years ago

Hi Qiang,

Were you able to resolve the build issues based on the Wiki?

James

alitimer commented 4 years ago

Same issue happened for me when building HDF5: getting error “recompile with -fPIC”

Making all in src make[1]: Entering directory '/home/ali/hdf5-1.8.12/src' make all-am make[2]: Entering directory '/home/ali/hdf5-1.8.12/src' CCLD libhdf5.la /usr/bin/ld: /usr/local/lib/libmpich.a(initthread.o): relocation R_X86_64_TPOFF32 against symbolMPIR_Thread' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /usr/local/lib/libmpich.a(ad_coll_build_req_new.o): relocation R_X86_64_PC32 against symbol stderr@@GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status Makefile:721: recipe for target 'libhdf5.la' failed make[2]: *** [libhdf5.la] Error 1 make[2]: Leaving directory '/home/ali/hdf5-1.8.12/src' Makefile:634: recipe for target 'all' failed make[1]: *** [all] Error 2 make[1]: Leaving directory '/home/ali/hdf5-1.8.12/src' Makefile:539: recipe for target 'all-recursive' failed make: *** [all-recursive] Error 1

Found this solution: https://superuser.com/questions/557884/getting-error-recompile-with-fpic however, wasn't sure how to address it here. Any advise please?

JamesEMcClure commented 4 years ago

It would be reasonable to try adding 'CFLAGS=-fPIC' to the configure line as suggested in that post.

CC=$MPI_DIR/bin/mpicc CXX=$MPI_DIR/bin/mpicxx CFLAGS="-fPIC" CXXFLAGS="-fPIC -O3 -std=c++14" ./configure --prefix=$LBPM_HDF5_DIR --enable-parallel --enable-shared --with-zlib=$LBPM_ZLIB_DIR && make && make install

Use the same for silo and the other scripts. I don't know if this will fix the problem but it is worth trying.

What version of GCC and MPI are you using? For reference my local workstation has gcc/6.2.0 and mpich/3.1.3, and our local HPC systems have gcc/7.3.0 and openmpi/3.1.2. It should build against other versions without any problem provided that GCC is not too old, but it has been tested and built many times against the versions above.

alitimer commented 4 years ago

gcc --version: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 It was mpich2-1.4.1, built and updated to mpich-3.1.3 (http://www.mpich.org/static/downloads/3.1.3/) same error happened when making hdf5-1.8.12:

Making all in src
make[1]: Entering directory '/home/ali/hdf5-1.8.12/src'
make  all-am
make[2]: Entering directory '/home/ali/hdf5-1.8.12/src'
  CCLD     libhdf5.la
/usr/bin/ld: /usr/local/lib/libmpich.a(initthread.o): relocation R_X86_64_TPOFF32 against symbol `MPIR_Thread' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libmpich.a(ad_coll_build_req_new.o): relocation R_X86_64_PC32 against symbol `stderr@@GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
Makefile:721: recipe for target 'libhdf5.la' failed
make[2]: *** [libhdf5.la] Error 1
make[2]: Leaving directory '/home/ali/hdf5-1.8.12/src'
Makefile:634: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/ali/hdf5-1.8.12/src'
Makefile:539: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

JamesEMcClure commented 4 years ago

Can you check to see if you have the build-essentials package installed

apt-list --installed | grep build-essentials

alitimer commented 4 years ago

Yes, it has been installed as that was one of the requirements for MPI

# apt list --installed | grep build-essential

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

build-essential/bionic,now 12.4ubuntu1 amd64 [installed]

alitimer commented 4 years ago

Update: Tried the procedure on a fresh VM (4-core), all good with dependencies!

cmake-3.10.2
build-essential(bionic-12.4)
gfortran
MPI (mpich-3.1.3)
zlib-1.2-11
HDF5-1.8.12
SILO-4.10.2

pointing out to static libraries (.a) in the CMakeList and make & make install.

P.S.: CPU compilation only, No GPU No TimerUtility No NetCDF Configure file:

# configure
rm -rf CMake*
cmake                                    \
    -D CMAKE_C_COMPILER:PATH=/usr/bin/mpich-3.1.3/mpich-3.1.3-install/bin/mpicc          \
    -D CMAKE_CXX_COMPILER:PATH=/usr/bin/mpich-3.1.3/mpich-3.1.3-install/bin/mpicxx        \
    -D CMAKE_C_FLAGS="-g "         \
    -D CMAKE_CXX_FLAGS="-g "      \
    -D CMAKE_CXX_STANDARD=14       \
    -D MPI_COMPILER:BOOL=TRUE            \
    -D MPIEXEC=mpirun                     \
    -D USE_EXT_MPI_FOR_SERIAL_TESTS:BOOL=TRUE \
    -D CMAKE_BUILD_TYPE:STRING=Release     \
    -D HDF5_DIRECTORY="/usr/bin/hdf5-1.8.12"     \
    -D HDF5_LIB="/usr/bin/hdf5-1.8.12/lib//libhdf5.a"   \
    -D SILO_LIB="/usr/bin/silo-4.10.2/lib/libsiloh5.a"           \
    -D SILO_DIRECTORY="/usr/bin/silo-4.10.2"     \
    -D USE_SILO=1                        \
    $LBPM_WIA_SOURCE

It worked; however, ctest was not successful:

The following tests FAILED:
     20 - TestBlobIdentify_2procs (Not Run)
     21 - TestBlobIdentify_4procs (Not Run)
     22 - TestSegDist_8procs (Not Run)
     23 - TestCommD3Q19_8procs (Not Run)
     25 - testCommunication_2procs (Not Run)
     26 - testCommunication_4procs (Not Run)
     30 - hello_world_2procs (Not Run)
     31 - hello_world_4procs (Not Run)
Errors while running CTest

Am I on the right track?

JamesEMcClure commented 4 years ago

Yes, there is a good chance that your build will work. The MPI tests do not always excecute properly due to the way ctest provides the flags. We should have a fix for this in the near future. The important thing is that the code should still execute correctly.

Try running through the examples on the tutorial and see if the results seem to make sense.

WanhuiZhang commented 4 years ago

Professor JamesEMcClure，

First, thank you for bringing us the outstanding LBM simulation software.

I want to install LBPM supported by GPU, but some problems appeared. Can you give me some guidance?

System and dependencies information are as follows:

Ubuntu18.04

Gcc 7.5.0

OpenMPI 3.1.2 (mpich3.1.6 also test)

CUDA 10.2

Other dependencies keep the same as listed in Tutorial-Step-0-Building-LBPM.

Based on these, the LBPM supported by CPU can be well installed and all tests passed. However, when I tried to install LBPM supported by GPU, the following tests failed:

TestFluxBC/ColorGradDFH/Poiseuille/ForceMoments/MassConservationD3Q7/ColorBubble/ColorSquareTube (SEGFAULT)

Test-BubbleDFH/CommD3Q19_8procs (Failed)

Some information prompted by the terminal is as follows:

Signal: Segmentation fault (11) Signal code: Invalid permissions (2) ….. And Unhandled signal (11) caught……

Note: The configuration script is consistent with the corresponding part of step-0-Building LBPM, but only the number in ‘-arch sm_70’ is modified to match my graphics card (NVIDIA Quadro P2200 ).

I try to change the version of OpenMPI or replace it with MPICH3.1.6, but these problems still occur. Where is the problem? Can you give me some guidance? thank you.

JamesEMcClure commented 4 years ago

Did you build OpenMPI with GPU support? To check you can run the following command:

ompi_info | grep cuda

On our local system, this returns the following:

          MPI extensions: affinity, cuda
                 MCA btl: smcuda (MCA v2.1.0, API v3.0.0, Component v3.1.2)
                MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v3.1.2)

I have never tried Quadro P2200 GPU with LBPM before, but I would expect you could at least get LBPM to run on it (it won't be as fast on other GPU that are built for HPC). You can also try disabling cuda support and build the CPU only version to make sure that is compiling properly.

JamesEMcClure commented 4 years ago

The other thing is that sometimes you need to set flags to get it to run properly with MPI. For the MPI version above, I launch LBPM based on the the following

export MPI_THREAD_MULTIPLE=1

MPIARGS="--hostfile host.list --bind-to core --mca pml ob1 --mca btl vader,self,smcuda,openib  --mca btl_openib_warn_default_gid_prefix 0  --mca btl_smcuda_use_cuda_ipc_same_gpu 0  --mca btl_openib_want_cuda_gdr 0  --mca btl_openib_cuda_async_recv false --mca btl_smcuda_use_cuda_ipc 0 --mca btl_openib_allow_ib true --mca btl_openib_cuda_rdma_limit 1000 -x LD_LIBRARY_PATH -x MPI_THREAD_MULTIPLE"

mpirun -np 4 $MPIARGS  $LBPM_BIN/lbpm_color_simulator color.db

CCBaobao commented 2 years ago

Professor JamesEMcClure， First, thank you for bringing us the outstanding LBM simulation software. I want to install LBPM supported by GPU, but some problems appeared. System and dependencies information are as follows: 1.Ubuntu18.04 2.OpenMPI 4.0.7 3.CUDA 11.6 when i run the following command:ompi_info | grep cuda this returns the following: MPI extensions: affinity, cuda, pcollreq MCA btl: smcuda (MCA v2.1.0, API v3.1.0, Component v4.0.7) MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v4.0.7) when I tried to install LBPM supported by GPU,the following tests failed: The following tests FAILED: 10 - TestColorGradDFH (Child aborted) 26 - TestCommD3Q19_8procs (Failed) Errors while running CTest Where is the problem? Can you give me some guidance? thank you.

JamesEMcClure commented 2 years ago

Are these the only two tests that failed?

The most likely reason for TestCommD3Q19_8procs to fail is that you do not have enough GPU locally to run 8 MPI processes (this would usually require at least 4, and possibly 8). If the other tests have passed, my suggestion would be to try to simulate a benchmark problem to see if it seems to be working properly. The following should run pretty quickly

https://lbpm-sim.org/examples/color/steadyState.html

If that works, then you can try a larger 3D problem to verify that the performance is good.

If you let me know some more details about your system (GPU version, number of GPU) I can try to give you some more concrete idea for what to expect.

CCBaobao commented 2 years ago

| 0 NVIDIA GeForce ... Off | 00000000:41:00.0 On | N/A | | 0% 50C P8 17W / 170W | 145MiB / 12288MiB | 0% Default | | | | N/A |

When i try to run the following https://lbpm-sim.org/examples/color/steadyState.html, the results are as follows: Running Color LBM

MPI rank=0 will use GPU ID 0 / 1 voxel length = 1.000000 micron voxel length = 1.000000 micron Input media: discs_3x128x128.raw.morphdrain.raw Relabeling 3 values oldvalue=0, newvalue =0 oldvalue=1, newvalue =1 oldvalue=2, newvalue =2 Dimensions of segmented image: 3 x 128 x 128 Reading 8-bit input data Read segmented data from discs_3x128x128.raw.morphdrain.raw Label=0, Count=11862 Label=1, Count=28486 Label=2, Count=8804 Distributing subdomains across 16 processors Process grid: 1 x 4 x 4 Subdomain size: 3 x 32 x 32 Size of transition region: 0 Media porosity = 0.758667 Initialized solid phase -- Converting to Signed Distance function Domain set. Create ScaLBL_Communicator Set up memory efficient layout, 2265 | 2288 | 5780 Allocating distributions Setting up device map and neighbor list Component labels: 1 label=0, affinity=-0.900000, volume fraction==0.460002 Initializing distributions Initializing phase field ....... Thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143

The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol cannot be used. cuIpcGetMemHandle return value: 201 address: 0x7f45c667c000 Check the cuda.h file for what the return value means. Perhaps a reboot of the node will clear the problem.

[xxxxxxxxxxxxxxxxxxxxxx] 509 more processes have sent help message help-mpi-common-cuda.txt / cuIpcGetMemHandle failed [xxxxxxxxxxxxxxxxxxxxxx] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [xxxxxxxxxxxxxxxxxxxxxx] 522 more processes have sent help message help-mpi-common-cuda.txt / cuIpcGetMemHandle failed [xxxxxxxxxxxxxxxxxxxxxx] 514 more processes have sent help message help-mpi-common-cuda.txt / cuIpcGetMemHandle failed [xxxxxxxxxxxxxxxxxxxxxx] 442 more processes have sent help message help-mpi-common-cuda.txt / cuIpcGetMemHandle failed

| Processes:
| GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3146 G /usr/lib/xorg/Xorg 20MiB | | 0 N/A N/A 3862 G /usr/lib/xorg/Xorg 122MiB | | 0 N/A N/A 174484 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174485 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174487 C .../bin/lbpm_color_simulator 107MiB | | 0 N/A N/A 174489 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174490 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174491 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174492 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174496 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174499 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174502 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174504 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174506 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174511 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174514 C .../bin/lbpm_color_simulator 105MiB | | 0 N/A N/A 174519 C .../bin/lbpm_color_simulator 107MiB | | 0 N/A N/A 174521 C .../bin/lbpm_color_simulator 105MiB | +-----------------------------------------------------------------------------+

JamesEMcClure commented 2 years ago

If you are using OpenMPI, perhaps try it by setting the following environment variables

export MPI_THREAD_MULTIPLE=1

export MPIARGS="--bind-to core --mca pml ob1 --mca btl vader,self,smcuda,openib  --mca btl_openib_warn_default_gid_prefix 0  --mca btl_smcuda_use_cuda_ipc_same_gpu 0  --mca btl_openib_want_cuda_gdr 0  --mca btl_openib_cuda_async_recv false --mca btl_smcuda_use_cuda_ipc  0 --mca btl_openib_allow_ib true --mca btl_openib_cuda_rdma_limit 1000 -x  LD_LIBRARY_PATH -x MPI_THREAD_MULTIPLE"

Then launch the code something like this:

mpirun -np 16 $MPIARGS  lbpm_color_simulator input.db

CCBaobao commented 2 years ago

Professor JamesEMcClure，thank you very much for taking the time to solve my problems. I want to know how to use the morphological pre-processors in LBPM when I try to use the simulation protocols like the Shell Aggregation Protocol you mentioned in your article.(https://www.digitalrocksportal.org/projects/317/origin_data/1354/) The core I used here is Berea sandstone，and the image is binarized that only the oil phase and solid matrix are present.(https://www.digitalrocksportal.org/projects/317/origin_data/1354/)

JamesEMcClure commented 2 years ago

There is a link on the documentation server with an example input file

https://lbpm-sim.org/examples/morphology/morphOpen.html

You should be able to modify the Domain section for your image

renxiaosa00 commented 1 year ago

There is a link on the documentation server with an example input file

https://lbpm-sim.org/examples/morphology/morphOpen.html

You should be able to modify the Domain section for your image what is the "WriteValues "?for example,my image has 100 bands, the WriteValues is 0,1,2,3,4……99?

JamesEMcClure commented 1 year ago

The purpose of WriteValues is to relabel the image (if desired / needed). The current convention for LBPM is that images with negative labels are solid (such as different minerals) and positive labels are "fluid" (or micro-porous solid). Often images obtained from other sources do not observe this convention, so it is convenient to be able to relabel the image internally to align with the convention.

Suppose the in your image label "0" was oil, label "1" was water, and the rest were solids with different wetting states associated. In this case you would set

ReadValues = 0, 1, 2, 3, 4, ..., 99
WriteValues = 1, 2, 0, -1, -2, ..., -97

The original image labels are specified by ReadValues and the labels that LBPM will use for each are specified in WriteValues

OPM / LBPM

Installation problems #22

The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol cannot be used. cuIpcGetMemHandle return value: 201 address: 0x7f45c667c000 Check the cuda.h file for what the return value means. Perhaps a reboot of the node will clear the problem.