drewpendergrass / CHEEREIO

This repository contains the code for CHEEREIO, which supports data assimilation and emissions inversions for arbitrary runs of the GEOS-Chem chemical transport model via an ensemble approach (i.e. without the model adjoint).
MIT License
9 stars 3 forks source link

Simulation Stops at ASSIM_START_DATE Despite No Errors in GC.log #13

Closed HuiruZhong closed 2 weeks ago

HuiruZhong commented 2 months ago

Hi Drew, I am encountering an issue when running CHEEREIO to simulate TROPOMI CH4. The GEOS-Chem model runs fine, but the simulation stops at the ASSIM_START_DATE. I have checked the GC.log files for each ensemble, and they all appear to run normally. A KILL file is generated in the scratch directory, indicating that the simulation for ensemble 22 did not succeed. However, there are no errors reported in the GC.log files. I am confused about what might be causing this issue. I have tested to ensure that it is not due to negative emissions. Could you please help me diagnose and resolve this problem?

Thank you!

drewpendergrass commented 2 months ago

Hi Huiru, Does your KILL_ENS file say "Python assimilation script exited without code 0 in ensemble x and core y"? If so, the problem is with the LETKF workflow and not with GEOS-Chem. To debug in this case, look at the corresponding shell error file, which will be stored in the ensemble_runs/logs folder in a file named ensemble_slurm_JOBNUMBER.err. This will include the Python traceback that caused the problem (feel free to post it here). See here in the docs for more info. If your KILL_ENS file says "GEOS-Chem in ensemble member x did not complete successfully" but in fact the GC.log file indicates no problem (i.e. the last line of the log file reads "** E N D O F G E O S -- C H E M **"), you should still check those ensemble_slurm_JOBNUMBER.err files (all of them). Sometimes there is a weird OS or shell error that pops up and is recorded as a GEOS-Chem error by the KILL_ENS script. If there are no errors recorded of any kind, then this most likely means that CHEEREIO failed randomly. Every once in a while (1 in 30 runs or so in my experience), an assimilation cycle fails to start for a reason that I've not quite been able to diagnose. Usually this is fixed on rerun. Execute the cleanup after kill ens script following the instructions here and resubmit your job and see if it works next time. Let me know what you find! Drew

HuiruZhong commented 1 month ago

Thank you very much for your help. I haven't replied sooner because I still haven't resolved the issue. The previous error might have been due to problems with the Linux environment.

drewpendergrass commented 1 month ago

Yes that certainly can be an issue. The cheereio.env file that ships with the software is only an example and needs to be modified for your machine. Try running a regular GEOS-Chem simulation with the cheereio.env environment and see if it works.