Spin-up simulation immediately stops after time stepping begins

alphasue12 commented 1 month ago

Your name

SU Yi

Your affiliation

Fudan University

What happened? What did you expect to happen?

The spin-up simulation almost stopped immediately after the log prints " B e g i n T i m e S t e p p i n g !! " and "TPCORE_FVDAS (based on GMI) Tracer Transport Module successfully initialized" (please refer to the log.txt). No restart files were generated. I also open the debug mode to investigate the source of the error, it seems the error occurs in the "tpcore_fvdas_mod.F90" (please refer to the debug.txt in the attachment), but i wonder how to fix it. debug.txt

What are the steps to reproduce the bug?

i am doing an spin-up simulation with the resolution of 2x2.5 for the global with GCClassic (13.3.4). The simulation is not in nested grid mode.

Please attach any relevant configuration and log files.

build_summary.txt HEMCO.log.txt HEMCO_Config.txt HEMCO_Diagn.rc.txt HISTORY.rc.txt input.geos.txt debug.txt log.txt

What GEOS-Chem version were you using?

13.3.4

What environment were you running GEOS-Chem on?

Local cluster

What compiler and version were you using?

gcc 7.5.0 on Ubuntu 18.04

Will you be addressing this bug yourself?

Yes

In what configuration were you running GEOS-Chem?

GCClassic

What simulation were you running?

Full chemistry

As what resolution were you running GEOS-Chem?

2x2.5

What meterology fields did you use?

MERRA-2

Additional information

I am using this old version because i want to reproduce the results from a research paper, which used this same version of GCClassic. When i run the spin-up simulation with the same configurations except for the resolution (use 4x5 instead), the simulation is successful in producing the restart file.

yantosca commented 1 month ago

Thanks for writing @alphasue12. I think you may not have maxed out your stack memory limits in your ~/.bashrc file. Please see this entry on ReadTheDocs:

https://geos-chem.readthedocs.io/en/latest/geos-chem-shared-docs/supplemental-guides/error-guide.html#segmentation-fault-encountered-after-tpcore-initialization

alphasue12 commented 1 month ago

Thank you for your suggestion and I just tried it. But i still have the same "Program received signal SIGSEGV: Segmentation fault" after the log prints NASA-GSFC Tracer Transport Module successfully initialized. I made sure that ulimit -s unlimited"and export OMP_STACKSIZE=500m have been added to ~/.bashrc and have been executed before executing the ./gcclassic command. What else could cause the problem?

yantosca commented 1 month ago

Thanks for the feedback @alphasue12. If you are running the GEOS-Chem job in a scheduler like SLURM, you might want to add a source ~/.bashrc in the run script. That will make sure that the environment variables you define in ~/.bashrc also get defined in the environment where the job runs.

yantosca commented 1 month ago

@alphasue12: I'm not sure how your system is set up, but if you are trying to run GEOS-Chem Classic on a login node, you might not have enough memory there. On our system, when you log in, you are placed on a login node, and from there you can schedule interactive or batch jobs on computational nodes. Our login nodes only allow 4GB of memory, so if you have a similar setup on your cluster, this is what may be causing your jobs to die. You can ask your sysadmin for more info.

alphasue12 commented 1 month ago

@yantosca thanks for your detailed and kind reply. I am currently using a Ubuntu server with 64 processors and installed with 82 GB memory. It seems that this problem may be related to memory setup of my server. I plan to migrate my GCClassic onto a computing platform which uses SLURM scheduler and try again.

yantosca commented 1 week ago

@alphasue12: Were you able to fix your issue?

geoschem / geos-chem