CIROH-UA / NGIAB-HPCInfra

NextGen In A Box: NextGen Generation Water Modeling Framework for Community Release (Singularity version)
0 stars 1 forks source link

Add CI for Singularity Image #7

Closed arpita0911patel closed 5 months ago

benlee0423 commented 6 months ago
  1. merged singularity directory into main branch
  2. Apply docker changes into singularity directory in docker_changes branch. ( TO DO)
benlee0423 commented 6 months ago

in line 32 singularity_ngen.def cp /opt/ohpc/admin/modulefiles/spack /apps/modulesfiles/all

This gives permission for all users to access modules.

benlee0423 commented 6 months ago

In install_netcdf_cxx.sh https://api.github.com/repos/Unidata/netcdf-cxx4/releases/latest

Verify BOOST_VERSION=1.79.0

benlee0423 commented 6 months ago

Singularity> cat /usr/include/boost/version.hpp | grep "BOOST_LIB_VERSION" // BOOST_LIB_VERSION must be defined to be the same as BOOST_VERSION

define BOOST_LIB_VERSION "1_75"

benlee0423 commented 6 months ago
singularity run --bind /home/ubuntu/workspace/input/AWI_004:/ngen/ngen/data ngen.sif /ngen/ngen/data
Select an option (type a number): 
1) Run NextGen model framework in serial mode    3) Run Bash shell
2) Run NextGen model framework in parallel mode  4) Exit
#? 2

Selected files:
Catchment: ./config/datastream.gpkg
Nexus: ./config/datastream.gpkg
Realization: ./config/realization.json

/ngen/HelloNGEN.sh: line 56: /dmod/bin/partitionGenerator: No such file or directory
Singularity> ls -lh /dmod/bin
total 0
lrwxrwxrwx 1 root root 24 Mar  1 05:18 ngen-parallel -> /ngen/parallelbuild/ngen
lrwxrwxrwx 1 root root 22 Mar  1 05:18 ngen-serial -> /ngen/serialbuild/ngen
lrwxrwxrwx 1 root root 38 Mar  1 05:18 partitionGenerator -> /ngen/parallelbuild/partitionGenerator
Singularity> ls -lh /ngen/parallelbuild/partitionGenerator
ls: cannot access '/ngen/parallelbuild/partitionGenerator': No such file or directory
benlee0423 commented 6 months ago

Getting an build error because boost version. Unpacking objects: 100% (14/14), 3.71 KiB | 1.24 MiB/s, done. From https://github.com/csdms/bmi-example-c

Currently Loaded Modules: 1) mpi/openmpi-x86_64

CMake Error at CMakeLists.txt:44 (find_path): Could not find NETCDF_MODULE_DIR using the following files: netcdf.mod

gmake: Makefile: No such file or directory gmake: ** No rule to make target 'Makefile'. Stop. chmod: cannot access '/dmod/bin/': No such file or directory CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find Boost: Found unsuitable version "1.75.0", but required is at least "1.79.0" (found /usr/include, ) Call Stack (most recent call first): /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:592 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake/Modules/FindBoost.cmake:2344 (find_package_handle_standard_args) CMakeLists.txt:168 (find_package)

gmake: Makefile: No such file or directory gmake: *** No rule to make target 'Makefile'. Stop. CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find Boost: Found unsuitable version "1.75.0", but required is at least "1.79.0" (found /usr/include, ) Call Stack (most recent call first): /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:592 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake/Modules/FindBoost.cmake:2344 (find_package_handle_standard_args) CMakeLists.txt:168 (find_package)

gmake: Makefile: No such file or directory gmake: No rule to make target 'Makefile'. Stop. make: No rule to make target 'partitionGenerator'. Stop.

benlee0423 commented 6 months ago

Getting an error with 004 input with newly built singularity image.

Initializing formulations [ { name : bmi_c++, params : { allow_exceed_end_time : true, fixed_time_step : false, init_config : /dev/null, library_file : /dmod/shared_libs/libslothmodel.so, main_output_variable : z, model_params : { EVAPOTRANS : 0, sloth_ice_fraction_schaake(1,double,m,node) : 0, sloth_ice_fraction_xinanjiang(1,double,1,node) : 0, sloth_smp(1,double,1,node) : 0, }, model_type_name : SLOTH, name : bmi_c++, registration_function : none, uses_forcing_file : false, }, }, { name : bmi_c, params : { allow_exceed_end_time : true, fixed_time_step : false, init_config : ./config/config.ini, library_file : /dmod/shared_libs/libcfebmi.so.1.0.0, main_output_variable : Q_OUT, model_params : { Cgw : 0.000460921, Klf : 0.16817, Kn : 0.401787, b : 8.66053, expon : 7.30882, max_gw_storage : 0.0402199, maxsmc : 0.543673, refkdt : 3.66134, satdk : 0.000117609, slope : 0.815479, }, model_type_name : CFE, name : bmi_c, registration_function : register_bmi_cfe, uses_forcing_file : false, variables_names_map : { atmosphere_water__liquid_equivalent_precipitation_rate : precip_rate, ice_fraction_schaake : sloth_ice_fraction_schaake, ice_fraction_xinanjiang : sloth_ice_fraction_xinanjiang, soil_moisture_profile : sloth_smp, water_potential_evaporation_flux : EVAPOTRANS, }, }, }, ] Not Using Routing Building Feature Index Catchment topology is dendritic. Running Models Running timestep 0 Too many open files Couldn't open file "/usr/share/udunits/udunits2.xml"

benlee0423 commented 6 months ago

In Serial run, I am getting the following error.

Schaake Magic Constant calculated
All CFE config params present
GIUH ordinates string value found in config ('1.00,0.00')
Counted number of GIUH ordinates (2)
Finished function parsing CFE config
At declaration of smc_profile size, soil_reservoir.n_soil_layers = 0
terminate called after throwing an instance of 'std::runtime_error'
  what():  Errno 24 (Too many open files) opening ./forcings/cat-1490610.csv
/ngen/HelloNGEN.sh: line 134:  1006 Aborted                 (core dumped) $run_command

real    5m7.275s
user    4m37.435s
sys 0m20.575s
benlee0423 commented 6 months ago

Too many open files is due to the following setting in HelloGEN.sh

# Increasing `ulimit` to Open files
ulimit -n 10000
benlee0423 commented 6 months ago

Parallel run gets an error with ulimit unlimited

mpirun noticed that process rank 1 with PID 0 on node ip-172-31-71-136 exited on signal 6 (Aborted).
benlee0423 commented 6 months ago

~/workspace/Ngen-Singularity/singularity$ singularity run --bind /home/ubuntu/workspace/input/AWI_004:/ngen/ngen/data ciroh-ngen-singularity_latest.sif /ngen/ngen/data

/ngen/HelloNGEN.sh: line 12: ulimit: open files: cannot modify limit: Operation not permitted

ulimit -n max is 1000000. Set in HelloGEN.sh

ulimit -n 1000000
benlee0423 commented 6 months ago

Can we use the same HelloNGEN.sh in both Docker and Singularity? It does same except ulimit -n 1000000.

benlee0423 commented 6 months ago

PR #10 merged

benlee0423 commented 5 months ago

Getting the following error

NGen Framework 0.1.0
NGen Framework 0.1.0
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ModuleNotFoundError: No module named 'numpy'
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ModuleNotFoundError: No module named 'numpy'

And, this is what I got from shell.

Singularity> pip3 install numpy
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: numpy in /usr/local/lib64/python3.9/site-packages (1.26.4)
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /home/ubuntu/.local/lib/python3.9/site-packages
sysconfig: /home/ubuntu/.local/lib64/python3.9/site-packages
WARNING: Additional context:
user = True
home = None
root = None
prefix = None
benlee0423 commented 5 months ago
module show mpi

-------------------------------------------------------------------------------------------------------------------------------------------------------------
  /usr/share/modulefiles/mpi/openmpi-x86_64:
-------------------------------------------------------------------------------------------------------------------------------------------------------------
conflict("mpi")
prepend_path("PATH","/usr/lib64/openmpi/bin")
prepend_path("LD_LIBRARY_PATH","/usr/lib64/openmpi/lib")
prepend_path("PKG_CONFIG_PATH","/usr/lib64/openmpi/lib/pkgconfig")
prepend_path("MANPATH",":/usr/share/man/openmpi-x86_64")
setenv("MPI_BIN","/usr/lib64/openmpi/bin")
setenv("MPI_SYSCONFIG","/etc/openmpi-x86_64")
setenv("MPI_FORTRAN_MOD_DIR","/usr/lib64/gfortran/modules/openmpi")
setenv("MPI_INCLUDE","/usr/include/openmpi-x86_64")
setenv("MPI_LIB","/usr/lib64/openmpi/lib")
setenv("MPI_MAN","/usr/share/man/openmpi-x86_64")
setenv("MPI_PYTHON3_SITEARCH","/usr/lib64/python3.9/site-packages/openmpi")
setenv("MPI_COMPILER","openmpi-x86_64")
setenv("MPI_SUFFIX","_openmpi")
setenv("MPI_HOME","/usr/lib64/openmpi")
benlee0423 commented 5 months ago

Run command in terminal

singularity run --bind /home/ubuntu/workspace/AWI_09_004:/ngen/ngen/data ciroh-ngen-singularity.sif "/ngen/ngen/data auto"

Run command inside running image

mpirun --allow-run-as-root -n 2 /dmod/bin/ngen-parallel ./config/datastream.gpkg all ./config/datastream.gpkg all ./config/realization.json ./partitions_2.json 
benlee0423 commented 5 months ago

This task is blocked by issue 12

benlee0423 commented 5 months ago

unblocked by using ngen commit id f91e2ea