geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
166 stars 157 forks source link

[BUG/ISSUE] GEOS-Chem simulation stops abruptly without any error #645

Closed map06 closed 3 years ago

map06 commented 3 years ago

Describe the bug:

Expected behavior:

I wanted to do a full chemistry simulation in 2 by 2.5-degree resolution in GEOS-Chem 12.9.2. I am using AWS, c5.4xlarge instance.

Actual behavior:

I was able to simulate for a month but can't simulate afterward using restart file from previous month. The simulations stop abruptly while running. I have already tried other versions too.

Steps to reproduce: the bug:

  1. Created run directory
  2. Edited simulation time in input.geos file
  3. Edited the HISTORY. rc file to include other output besides species concentration

Run commands Ran command make realclean and then compiled using make -j16 build, have also tried make build. It to give the same results.

Error messages

********************************************
* B e g i n   T i m e   S t e p p i n g !! *
********************************************

---> DATE: 2009/02/01  UTC: 00:00  X-HRS:      0.000000
 HEMCO already called for this timestep. Returning.
===============================================================================
TPCORE_FVDAS (based on GMI) Tracer Transport Module successfully initialized
===============================================================================
ubuntu@ip-172-31-72-236:~/tutorial/merra2_2x25_standard$ 
[cloud] 0:bash*                                                

After this, the program stops without any error

Required information:

Your GEOS-Chem version and runtime environment:

Input and log files to attach

All the log files are included in the zip logs.zip

yantosca commented 3 years ago

Thanks for writing @map06. Usually when a run dies like that after the TPCORE message it is becuase the OMP_STACKSIZE variable has not been set properly. See this wiki page: http://wiki.geos-chem.org/Specifying_settings_for_OpenMP_parallelization#Parallelization_settings_for_GEOS-Chem_.22Classic.22

map06 commented 3 years ago

@yantosca I added the following lines in the ~/.bashrc file

ulimit -s unlimited
export OMP_STACKSIZE=500m
export OMP_NUM_THREADS=`nproc`

I ran shell script as follows:

#!/bin/bash

# Apply your environment settings to the computational queue
source ~/.bashrc

# In an AWS cloud instance, you own the entire node, so there is no need
# for a scheduler.  Use nproc to specify the number of cores for OpenMP.
export OMP_NUM_THREADS=`nproc`
ulimit -s unlimited
export OMP_STACKSIZE=500m

# Run Geos chem simulation 
./geos 2>&1 | tee GC.log

Still, I am having the same issue.

yantosca commented 3 years ago

Thanks for your reply @map06. The c4.4xlarge instance should definitely have enough RAM to run for the 2 x 2.5 simulation. If not you might need to use a c4.8xlarge instance.

Can you also attach the GC.log file? That might have some information that would help.

map06 commented 3 years ago

@yantosca Thank you. I have attached the file here GC.log, i tried to simulate it again so the date might be different but error is same and the process is also same. I realized that error happens when I turn on Boundary Conditions on HISTORY.rc file.

map06 commented 3 years ago

Also, when I use the restart file from the previous month simulation, turning off the Boundary Condition the simulation proceeds little further:


********************************************
* B e g i n   T i m e   S t e p p i n g !! *
********************************************

---> DATE: 2009/02/01  UTC: 00:00  X-HRS:      0.000000
 HEMCO already called for this timestep. Returning.
===============================================================================
TPCORE_FVDAS (based on GMI) Tracer Transport Module successfully initialized
===============================================================================
HEMCO (VOLCANO): Opening /home/ubuntu/ExtData/HEMCO/VOLCANO/v2019-08/2009/02/so2_volcanic_emissions_Carns.20090201.rc
--- Initialize surface boundary conditions from input file ---
--> CCl4 will use prescribed surface boundary conditions from field SfcVMR_CCl4
--> CFC11 will use prescribed surface boundary conditions from field SfcVMR_CFC11
--> CFC113 will use prescribed surface boundary conditions from field SfcVMR_CFC113
--> CFC114 will use prescribed surface boundary conditions from field SfcVMR_CFC114
--> CFC115 will use prescribed surface boundary conditions from field SfcVMR_CFC115
--> CFC12 will use prescribed surface boundary conditions from field SfcVMR_CFC12
--> CH2Cl2 will use prescribed surface boundary conditions from field SfcVMR_CH2Cl2
--> CH3Br will use prescribed surface boundary conditions from field SfcVMR_CH3Br
--> CH3CCl3 will use prescribed surface boundary conditions from field SfcVMR_CH3CCl3
--> CH3Cl will use prescribed surface boundary conditions from field SfcVMR_CH3Cl
--> CHCl3 will use prescribed surface boundary conditions from field SfcVMR_CHCl3
--> H1211 will use prescribed surface boundary conditions from field SfcVMR_H1211
--> H1301 will use prescribed surface boundary conditions from field SfcVMR_H1301
--> H2402 will use prescribed surface boundary conditions from field SfcVMR_H2402
--> HCFC141b will use prescribed surface boundary conditions from field SfcVMR_HCFC141b
--> HCFC142b will use prescribed surface boundary conditions from field SfcVMR_HCFC142b
--> HCFC22 will use prescribed surface boundary conditions from field SfcVMR_HCFC22
--> N2O will use prescribed surface boundary conditions from field SfcVMR_N2O
--> OCS will use prescribed surface boundary conditions from field SfcVMR_OCS
--> H2 will use prescribed surface boundary conditions from field SfcVMR_H2
--- Finished initializing surface boundary conditions ---
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% USING O3 COLUMNS FROM THE MET FIELDS! %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     - RDAER: Using online SO4 NH4 NIT!
     - RDAER: Using online BCPI OCPI BCPO OCPO!
     - RDAER: Using online SALA SALC
     - DO_STRAT_CHEM: Linearized strat chemistry at 2009/02/01 00:00
###############################################################################
# Interpolating Linoz fields for feb
###############################################################################
     - LINOZ_CHEM3: Doing LINOZ
===============================================================================
Successfully initialized ISORROPIA code II
===============================================================================
---> DATE: 2009/02/01  UTC: 00:10  X-HRS:      0.166667
Killed

But, eventually gets killed again. But without using previous restart file and doing simulation again, i do not get error besides from Boundary Conditions Here are the log files

yantosca commented 3 years ago

That looks like a memory error. Your simulation is taking up too much memory for the c5.4x5_large instance. You might try an instance with more memory: https://aws.amazon.com/ec2/instance-types/c5. Maybe c5.12x_large would do it.

map06 commented 3 years ago

@yantosca Yeah. It seems so. Currently I am using c5.8x large instance and I am not getting any erros.