geoschem / GCHP

The "superproject" wrapper repository for GCHP, the high-performance instance of the GEOS-Chem chemical-transport model.
https://gchp.readthedocs.io
Other
21 stars 25 forks source link

GCHP simulation stopped after 7 month simulation (total set time for 1 yr) #405

Closed Hemrajbhattarai closed 3 months ago

Hemrajbhattarai commented 4 months ago

Name and Institution (Required)

Name: Hemraj Bhattarai Institution: The Chinese University of Hong Kong

Description of your issue or question

I am running GCHP version 14.3.0 cap_restart: 20170101 000000 Run_Duration="00010000 000000"

CS_RES=48 STRETCH_GRID=ON STRETCH_FACTOR=4.0 TARGET_LAT=28.0 TARGET_LON=80.0

The model simulation perfectly runs and gives output from 20170101 to 20170731, however, it stopped after that. I didn't find useful information in the log file: 1514335_print_out.log.txt 1514335_error.log.txt

I also tried resubmitting the case with a new restart file that the model generates (I set Checkpoint_Freq=monthly). But the model still fails running with new error type ( 1526524_error.log.txt ) I am not sure where this problem is coming from, but I tested with a new restart file that the model generated itself, but failed to run. When I tested it, I changed cap_restart to 20170801 (since I already for 7 months), and also changed run duration to 00000500 000000.

I am looking for appropriate suggestions for debugging the issue. Thank you.

Hemrajbhattarai commented 4 months ago

Let me provide additional informations:

I wonder can I directly use the internal_checkpoint files that are stored in Restart directory (e.g., gcchem_internal_checkpoint.20170801_0000z.nc4), or do these files need some processing before making use of them. I presume they can be directly used, but I am failing in making direct use of them.

I tested with other restart file (not the one that model generates during simulation), the model runs well, but with gcchem_internal_checkpoint.xxxx.nc4 the model is not running. I think this means the problem is on restart file.

again, let me clarify: I am running with stretch grid, and setting in my setCommonsetting.sh looks like: CS_RES=48 STRETCH_GRID=ON STRETCH_FACTOR=4.0 TARGET_LAT=28.0 TARGET_LON=80.0

And beginning and end lines of gcchem_internal_checkpoint.xxxx.nc4 looks like: dimensions: lat = 288 ; lev = 72 ; lon = 48 ; time = 1 ; variables:

// global attributes: :STRETCH_FACTOR = 4.f ; :TARGET_LAT = 28.f ; :TARGET_LON = 80.f ;

I hope these additional informations help more in understanding the problem.

yantosca commented 4 months ago

Thanks for writing @Hemrajbhattarai. There was a similar issue geoschem/geos-chem#318 not long ago. In that case the problem was solved by using target longitude in 0..360 coordinates when regridding a regular GCHP grid to a GCHP stretched grid. I don't know if that will solve your issue but that would be the first thing to check.

Hemrajbhattarai commented 4 months ago

Thanks Bob for prompt reply. My target point is around Delhi, India, so I think the lat and long on both 0 to 360 and -180 to 180 would remain the same (target lat = 28N, lon = 80E). More importantly, I am using the restart file obtained from the internal_checkpoint of my simulation that interrupted after running for 7 months. In other word, it is like continue run for 8th month, so I think the setting of target lat and long should have no issue.

lizziel commented 4 months ago

Hi @Hemrajbhattarai, I think there are two separate issues. Your original post that the model ran for one month and then stopped has this error in the log file: pe=00070 FAIL at line=02608 ExtDataGridCompMod.F90 <unknown error> This indicates a problem with an input file. The regular log shows that it stopped running at hour 23 of Aug 1 2017. This makes me think there is an issue finding an hourly file for Aug 2 2017, possible a meteorology file. Do you have the output log file allPEs.log for that run? It might have an error message from the ExtData component of MAPL which handles input files.

The second issue is starting up a stretched grid run using an output restart file from the model. We have a separate report of this in issue https://github.com/geoschem/GCHP/issues/402 and I am looking into it. You should be able to start up a stretched grid simulation with any of the checkpoint files. There may be a bug preventing that that somehow was not reported until now. I will look into it.

Hemrajbhattarai commented 4 months ago

Dear @lizziel Thank you for pointing this out, the OFFLINE_DUST files for Aug 2 2017 and onwards were missing. I downloaded and added all those OFFLINE_DUST files and stored them in their respective directory.

I rerun the simulation and the problem is still the same. I double-checked if I stored them in the right directory, and they seem to be ok, but the problem does exist.

attached are the log files and other pieces of information.

allPEs.log 1529154_print_out.log 1529154_error.log

Hemrajbhattarai commented 4 months ago

I did two test runs, difference is only the restart file.

1) The log files I have attached above few hours before uses the restart file obtained by Checkpoint (gcchem_internal_checkpoint.xxxxz.nc4) during model simulation. Making use of this restart file mails to run the model (error log files in above comment).

2) This simulation is exactly same as 1) but only change restart file. Now the restart file is the one generated by GCPy from the default nc file. The model runs fine.

I suspect some problem with the restart file ontained by Checkpoint.

I make some quick check among the variables, and seems some variables are not in checkpoint restart file. Not sure if this is the reason, you can check! check_original_checkpoint_test

lizziel commented 4 months ago

Thanks @Hemrajbhattarai for the update. I am going to try to reproduce the restart file issue for stretched grid. It does appear to be a bug.

lizziel commented 3 months ago

Hi @Hemrajbhattarai, this bug will be fixed in version 14.4.0. You can apply the fix manually using the update here: https://github.com/geoschem/MAPL/pull/34. This does not address missing variables in the restart. Try the fix and see if it solves your problem.

lizziel commented 3 months ago

I believe any remaining issues with stretched grid are summarized in https://github.com/geoschem/GCHP/issues/404. If there are additional problems please create a new github issue.