Open rgknox opened 9 years ago
Hey @DanielNScott . SMP with the Hybrid Integrator has been working fantastically for me. I completed a set of my 12000 yr runs at 6 northeastern sites over the weekend (~48 hrs at DTLSM = 900) and had one stability interruption that then went through just fine when I dropped DTLSM from 900 to 600 (15 min to 10).
I'm redoing them with the version from yesterday's updated pull request just to double check, but the numbers I got from the first run and the spin I just finished look completely reasonable based on quick glances at snapshots (no thorough analysis yet).
The current EDModel master branch has not shown any signs of instability during limited testing, so I am ever hopeful that this, along with Christy's new CBR changes (to be merged in any moment) will be considered a stable release and recommended for production/research runs. My testing cluster edison.nersc.gov was down for maintenance this morning, but I'm getting really excited to test everything out!
On Wed, Mar 25, 2015 at 1:19 PM, Christy Rollinson <notifications@github.com
wrote:
Hey @DanielNScott https://github.com/DanielNScott . SMP with the Hybrid Integrator has been working fantastically for me. I completed a set of my 12000 yr runs at 6 northeastern sites over the weekend (~48 hrs at DTLSM = 900) and had one stability interruption that then went through just fine when I dropped DTLSM from 900 to 600 (15 min to 10).
I'm redoing them with the version from yesterday's updated pull request just to double check, but the numbers I got from the first run and the spin I just finished look completely reasonable based on quick glances at snapshots (no thorough analysis yet).
— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-86201433.
Just want to report an issue with SMP. I can compile it with "-fopenmp" but when I try to run it, I get the error below but if I remove -fopenmp and compile it, the model runs. Any thought on this?
+--- Parallel info: -------------------------------------+
Machsize = 1 +--------------------------------------------------------+ Reading namelist information Copying namelist +------------------------------------------------------------+ | Ecosystem Demography Model, version 2.2 +------------------------------------------------------------+ | Input namelist filename is ED2IN |
---|---|---|
Single process execution on INITIAL run. |
+------------------------------------------------------------+ => Generating the land/sea mask. /projectnb/dietzelab/EDI/oge2OLD/OGE2_HEADER -> Getting file: /projectnb/dietzelab/EDI/oge2OLD/OGE2_30N090W.h5... Segmentation fault
I think I just realized what’s going on! Before you trying running the SMP ED, you need to set it for multi-threaded, which means logging on the to cluster with parallelization on.
To do this on the BU server: 1) log on to geo 2) type: qlogin 3) OMP_NUM_THREADS=8 (or 16 or whatever) 4) run ED
On Apr 2, 2015, at 5:49 PM, Afshin Pourmokhtarian notifications@github.com wrote:
Just want to report an issue with SMP. I can compile it with "-fopenmp" but when I try to run it, I get the error below but if I remove -fopenmp and compile it, the model runs. Any thought on this?
+--- Parallel info: -------------------------------------+
- Machnum = 0
- Machsize = 1 +--------------------------------------------------------+ Reading namelist information Copying namelist +------------------------------------------------------------+ | Ecosystem Demography Model, version 2.2 +------------------------------------------------------------+ | Input namelist filename is ED2IN | | Single process execution on INITIAL run. +------------------------------------------------------------+ => Generating the land/sea mask. /projectnb/dietzelab/EDI/oge2OLD/OGE2_HEADER -> Getting file: /projectnb/dietzelab/EDI/oge2OLD/OGE2_30N090W.h5... Segmentation fault — Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-89054945.
These are my compilation flags:
CMACH=PC_LINUX1 F_COMP=mpif90
F_OPTS= -g -Wall -W -ffpe-trap=invalid,zero,overflow -Wconversion -fbounds-check -fbacktrace -fdump-core -fopenmp C_COMP=mpicc
C_OPTS = -03 -DLITTLE LOADER=mpif90
LOADER_OPTS=${F_OPTS} C_LOADER=mpicc
LIBS= MOD_EXT=mod
MPI_PATH= PAR_INCS= PAR_LIBS= PAR_DEFS=-DRAMS_MPI
Did you set your stack limit to unlimited? On the run node:
ulimit -s unlimited On Apr 2, 2015 3:04 PM, "Afshin Pourmokhtarian" notifications@github.com wrote:
These are my compilation flags: Compile flags ------------------------------------------------
CMACH=PC_LINUX1 F_COMP=mpif90
F_COMP = gfortran
F_OPTS= -V -FR -O2 -recursive -static -Vaxlib -check all -g -fpe0 -ftz
-debug extended \ -debug inline_debug_info -debug-parameters all -traceback -ftrapuv
F_Opts= -03
F_OPTS= -g -Wall -W -ffpe-trap=invalid,zero,overflow -Wconversion -fbounds-check -fbacktrace -fdump-core -fopenmp C_COMP=mpicc
C_OPTS= -O2 -DLITTLE -g -static -traceback -debug extended
C_OPTS = -03 -DLITTLE LOADER=mpif90
LOADER = gfortran
LOADER_OPTS=${F_OPTS} C_LOADER=mpicc
C_LOADER_OPTS=-v -g -traceback -static
LIBS= MOD_EXT=mod MPI Flags ----------------------------------------------------
MPI_PATH= PAR_INCS= PAR_LIBS= PAR_DEFS=-DRAMS_MPI
— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-89058448.
Thanks @crollinson that was the issue and now it works.
Hi All,
I put the Shared Memory Parallelism commits on the master. This will allow for the splitting of radiation scattering, photosynthesis and thermodynamics of different patches to different CPU cores.
This has been tested using RK4 and Hybrid integration This has had limited testing on gridded runs This has had no testing on coupled runs (but I don't suspect any breakage).
If you don't want to use shared memory, just keep doing what you have done in the past and nothing should change.
If you do want to use it, follow these steps for a single polygon run:
1) compile code with shared memory directives, if you are using OpenMP, the flag is '-fopenmp' 2) (optional) increase your stack size. On linux: "ulimit -s unlimited" 3) set run-time environment variables. If you are using OpenMP, the key variable is OMP_NUM_THREADS. This defines how many shared memory cores will be used. On linux: "export OMP_NUM_THREADS=X" where X is the number of cores you wish to use. REMEMBER: These cores must share RAM, so you are limited by the number of cores that are on one node. 4) Execute the simulation as you would normally.
This release is experimental for the time being. If you have trouble or crashes or poor reproducability of previous work, revert to commit 2a5d68ebb291581c932a442e2701e553b24b1170
ie:
git checkout 2a5d68ebb291581c932a442e2701e553b24b1170