ECP-VeloC / VELOC

Very-Low Overhead Checkpointing System
http://veloc.rtfd.io
MIT License
52 stars 21 forks source link

error using test/heatdis example #29

Closed denisbertini closed 4 years ago

denisbertini commented 4 years ago

Trying to restart the test/heasdis_file example from a broken state i got the follwing error

ERROR 3830768914477] [/u/dbertini/mpiio/veloc/src/lib/client.cpp:145:route_file] must call checkpoint_begin() first

than the program hangs ... Should one add a call to checkpoint_begin() ? If yes, where exactly ?

denisbertini commented 4 years ago

Adding the missing call removed the ERROR message, but by restart after a kill -C the processes hang and only few process show CPU saturation the others are idle, any idea ?

bnicolae commented 4 years ago

Please try the master branch.

denisbertini commented 4 years ago

I tried the master branch ( git command written in the documentation ) and got the same problem. The heatdis_mem seems to work fine but not the heatdis_file example . I tried the new git version as well, but i stop intalling when using chown root command for pdsh. I have no permission for that on the local cluster. Why using this command if i use ssh ?

bnicolae commented 4 years ago

The command from the documentation is for stable releases. The master branch can be obtained using this git command: git clone --single-branch --depth 1 https://github.com/ECP-VeloC/veloc.git <source_dir>

I am not sure I understand what you are saying. The installation of VELOC does not require root. Please try the command above and follow the instructions from the documentation for the following steps. Let us know if you still have any issue.

denisbertini commented 4 years ago

Using the command you quoted i got the following problem:

Install the project...
-- Install configuration: "Release"
-- Installing: /lustre/rz/dbertini/iotest/veloc/include/er.h
-- Installing: /lustre/rz/dbertini/iotest/veloc/lib64/liber.so
-- Set runtime path of "/lustre/rz/dbertini/iotest/veloc/lib64/liber.so" to ""
-- Installing: /lustre/rz/dbertini/iotest/veloc/lib64/liber.a
CMake Warning:
  No source or binary directory provided.  Both will be assumed to be the
  same as the current working directory, but note that this warning will
  become a fatal error in future CMake releases.

-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Check for working C compiler: /opt/ohpc/pub/compiler/gcc/8.3.0/bin/gcc
-- Check for working C compiler: /opt/ohpc/pub/compiler/gcc/8.3.0/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/ohpc/pub/compiler/gcc/8.3.0/bin/g++
-- Check for working CXX compiler: /opt/ohpc/pub/compiler/gcc/8.3.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find PDSH (missing: PDSH_EXE DSHBAK_EXE) 
-- Found Boost: /lustre/rz/dbertini/iotest/veloc/include (found suitable version "1.73.0", minimum required is "1.60")  
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
CMake Error at /opt/ohpc/pub/utils/cmake/3.15.4/share/cmake-3.15/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
  system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY
  OPENSSL_INCLUDE_DIR)
Call Stack (most recent call first):
  /opt/ohpc/pub/utils/cmake/3.15.4/share/cmake-3.15/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /opt/ohpc/pub/utils/cmake/3.15.4/share/cmake-3.15/Modules/FindOpenSSL.cmake:413 (find_package_handle_standard_args)
  CMakeLists.txt:33 (find_package)

-- Configuring incomplete, errors occurred!
See also "/u/dbertini/mpiio/veloc/CMakeFiles/CMakeOutput.log".
See also "/u/dbertini/mpiio/veloc/CMakeFiles/CMakeError.log".
Installation failed!

Any idea ?

denisbertini commented 4 years ago

I just tried with release tag veloc.1.2 and it works without the missing library/exec. I will keep using the release then.

bnicolae commented 4 years ago

Glad to hear it works!