geoschem / gchp_legacy

Repository for GEOS-Chem High Performance: software that enables running GEOS-Chem on a cubed-sphere grid with MPI parallelization.
http://wiki.geos-chem.org/GEOS-Chem_HP
Other
7 stars 13 forks source link

[BUG/ISSUE] Do not write checkpoint file at the very beginning #20

Closed JiaweiZhuang closed 5 years ago

JiaweiZhuang commented 5 years ago

I am aware of this config:

https://github.com/geoschem/gchp/blob/7a4589c276876b6674800f4e4137b575e4def4f5/Run/runConfig.sh_template#L55-L66

It effectively sets GCHP.rc to:

# Settings for production of restart files
#---------------------------------------------------------------
# Record frequency (HHMMSS) : Frequency of restart file write
#                             Can exceed 24 hours (e.g. 1680000 for 7 days)
# Record ref date (YYYYMMDD): Reference date; set to before sim start date
# Record ref time (HHMMSS)  : Reference time
RECORD_FREQUENCY: 100000000
RECORD_REF_DATE: 20160701
RECORD_REF_TIME: 000000

However, GCHP still writes out a checkpoint file at the very beginning:

...
CFIO: Reading ./MainDataDir/MASKS/v2018-09/AF_LANDMASK.geos.05x0666.global.nc at 19850101 000000
 NOT using buffer I/O for file: TileFiles/DC0540xPC0361_CF0024x6C.bin
CFIO: Reading ./MainDataDir/MASKS/v2018-09/China_mask.generic.1x1.nc at 19850101 000000
CFIO: Reading ./MainDataDir/MASKS/v2018-09/India_mask.generic.1x1.nc at 19850101 000000
   Character Resource Parameter GIGCchem_INTERNAL_CHECKPOINT_TYPE: pnc4
 Using parallel NetCDF for file: 
 gcchem_internal_checkpoint_c24.nc.20160701_0000z.bin
 offline_tracer_advection
 Initialized species from INTERNAL state: NO
...

For a C180 run, this file is 27GB (!!) and takes long time to write:

$ du -sh gcchem_internal_checkpoint_c180.nc.20160701_0000z.bin
27G gcchem_internal_checkpoint_c180.nc.20160701_0000z.bin

Is there an option to turn it off?

lizziel commented 5 years ago

Unfortunately I have not spent any time trying to figure out to turn that off, although I agree it needs to be addressed. Best just comment out the RECORD lines in GCHP.rc to turn it off. This is something I can bring up with the MAPL developers.

-- Lizzie Lundgren Scientific Programmer GEOS-Chem Support Team geos-chem-support@as.harvard.edumailto:geos-chem-support@as.harvard.edu http://wiki.geos-chem.org/GEOS-Chem_Support_Team

Please direct all GEOS-Chem support issues to the entire GEOS-Chem Support Team at geos-chem-support@as.harvard.edumailto:geos-chem-support@as.harvard.edu. This will allow us to serve you better.

From: Jiawei Zhuang notifications@github.com Reply-To: geoschem/gchp reply@reply.github.com Date: Tuesday, January 15, 2019 at 4:56 PM To: geoschem/gchp gchp@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [geoschem/gchp] Do not write the checkpoint file at the very beginning (#20)

I am aware of this config:

https://github.com/geoschem/gchp/blob/7a4589c276876b6674800f4e4137b575e4def4f5/Run/runConfig.sh_template#L55-L66https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_blob_7a4589c276876b6674800f4e4137b575e4def4f5_Run_runConfig.sh-5Ftemplate-23L55-2DL66&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=rw6MY5bDHyMsebwtbGVvQ3x3knJnpzxsS0Ogm9G-6qE&s=iaIdG1w4djy4XJxcP8wk8vyK3WhIV4FdyT21qY--0IM&e=

It effectively sets GCHP.rc to:

Settings for production of restart files

---------------------------------------------------------------

Record frequency (HHMMSS) : Frequency of restart file write

Can exceed 24 hours (e.g. 1680000 for 7 days)

Record ref date (YYYYMMDD): Reference date; set to before sim start date

Record ref time (HHMMSS) : Reference time

RECORD_FREQUENCY: 100000000

RECORD_REF_DATE: 20160701

RECORD_REF_TIME: 000000

However, GCHP still writes out a checkpoint file at the very beginning:

...

CFIO: Reading ./MainDataDir/MASKS/v2018-09/AF_LANDMASK.geos.05x0666.global.nc at 19850101 000000

NOT using buffer I/O for file: TileFiles/DC0540xPC0361_CF0024x6C.bin

CFIO: Reading ./MainDataDir/MASKS/v2018-09/China_mask.generic.1x1.nc at 19850101 000000

CFIO: Reading ./MainDataDir/MASKS/v2018-09/India_mask.generic.1x1.nc at 19850101 000000

Character Resource Parameter GIGCchem_INTERNAL_CHECKPOINT_TYPE: pnc4

Using parallel NetCDF for file:

gcchem_internal_checkpoint_c24.nc.20160701_0000z.bin

offline_tracer_advection

Initialized species from INTERNAL state: NO

...

For a C180 run, this file is 27GB (!!) and takes long time to write:

$ du -sh gcchem_internal_checkpoint_c180.nc.20160701_0000z.bin

27G gcchem_internal_checkpoint_c180.nc.20160701_0000z.bin

Is there an option to turn it off?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_issues_20&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=rw6MY5bDHyMsebwtbGVvQ3x3knJnpzxsS0Ogm9G-6qE&s=ZPhAkENneKbR-mk-sp-GxEc2VaCE-VKECgY56LRhrpw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAnyqyWhI3rum42hRoiMe3XXiKukzbA-2Dks5vDk5zgaJpZM4aB4-2Dl&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=rw6MY5bDHyMsebwtbGVvQ3x3knJnpzxsS0Ogm9G-6qE&s=EPlpHPgoLyuza_hE1xTGz7zn9jY2rSrr8sRCAo2zx6Q&e=.

lizziel commented 5 years ago

You may also need to comment out the sections of runConfig.sh that automatically set those fields as well.

-- Lizzie Lundgren Scientific Programmer GEOS-Chem Support Team geos-chem-support@as.harvard.edumailto:geos-chem-support@as.harvard.edu http://wiki.geos-chem.org/GEOS-Chem_Support_Team

Please direct all GEOS-Chem support issues to the entire GEOS-Chem Support Team at geos-chem-support@as.harvard.edumailto:geos-chem-support@as.harvard.edu. This will allow us to serve you better.

From: "Lundgren, Elizabeth W" elundgren@seas.harvard.edu Date: Tuesday, January 15, 2019 at 4:58 PM To: geoschem/gchp reply@reply.github.com, geoschem/gchp gchp@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [geoschem/gchp] Do not write the checkpoint file at the very beginning (#20)

Unfortunately I have not spent any time trying to figure out to turn that off, although I agree it needs to be addressed. Best just comment out the RECORD lines in GCHP.rc to turn it off. This is something I can bring up with the MAPL developers.

-- Lizzie Lundgren Scientific Programmer GEOS-Chem Support Team geos-chem-support@as.harvard.edumailto:geos-chem-support@as.harvard.edu http://wiki.geos-chem.org/GEOS-Chem_Support_Team

Please direct all GEOS-Chem support issues to the entire GEOS-Chem Support Team at geos-chem-support@as.harvard.edumailto:geos-chem-support@as.harvard.edu. This will allow us to serve you better.

From: Jiawei Zhuang notifications@github.com Reply-To: geoschem/gchp reply@reply.github.com Date: Tuesday, January 15, 2019 at 4:56 PM To: geoschem/gchp gchp@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [geoschem/gchp] Do not write the checkpoint file at the very beginning (#20)

I am aware of this config:

https://github.com/geoschem/gchp/blob/7a4589c276876b6674800f4e4137b575e4def4f5/Run/runConfig.sh_template#L55-L66https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_blob_7a4589c276876b6674800f4e4137b575e4def4f5_Run_runConfig.sh-5Ftemplate-23L55-2DL66&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=rw6MY5bDHyMsebwtbGVvQ3x3knJnpzxsS0Ogm9G-6qE&s=iaIdG1w4djy4XJxcP8wk8vyK3WhIV4FdyT21qY--0IM&e=

It effectively sets GCHP.rc to:

Settings for production of restart files

---------------------------------------------------------------

Record frequency (HHMMSS) : Frequency of restart file write

Can exceed 24 hours (e.g. 1680000 for 7 days)

Record ref date (YYYYMMDD): Reference date; set to before sim start date

Record ref time (HHMMSS) : Reference time

RECORD_FREQUENCY: 100000000

RECORD_REF_DATE: 20160701

RECORD_REF_TIME: 000000

However, GCHP still writes out a checkpoint file at the very beginning:

...

CFIO: Reading ./MainDataDir/MASKS/v2018-09/AF_LANDMASK.geos.05x0666.global.nc at 19850101 000000

NOT using buffer I/O for file: TileFiles/DC0540xPC0361_CF0024x6C.bin

CFIO: Reading ./MainDataDir/MASKS/v2018-09/China_mask.generic.1x1.nc at 19850101 000000

CFIO: Reading ./MainDataDir/MASKS/v2018-09/India_mask.generic.1x1.nc at 19850101 000000

Character Resource Parameter GIGCchem_INTERNAL_CHECKPOINT_TYPE: pnc4

Using parallel NetCDF for file:

gcchem_internal_checkpoint_c24.nc.20160701_0000z.bin

offline_tracer_advection

Initialized species from INTERNAL state: NO

...

For a C180 run, this file is 27GB (!!) and takes long time to write:

$ du -sh gcchem_internal_checkpoint_c180.nc.20160701_0000z.bin

27G gcchem_internal_checkpoint_c180.nc.20160701_0000z.bin

Is there an option to turn it off?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_issues_20&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=rw6MY5bDHyMsebwtbGVvQ3x3knJnpzxsS0Ogm9G-6qE&s=ZPhAkENneKbR-mk-sp-GxEc2VaCE-VKECgY56LRhrpw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAnyqyWhI3rum42hRoiMe3XXiKukzbA-2Dks5vDk5zgaJpZM4aB4-2Dl&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=rw6MY5bDHyMsebwtbGVvQ3x3knJnpzxsS0Ogm9G-6qE&s=EPlpHPgoLyuza_hE1xTGz7zn9jY2rSrr8sRCAo2zx6Q&e=.

JiaweiZhuang commented 5 years ago

Thanks for the tips! Just double-check:

By commenting out

# RECORD_FREQUENCY: 100000000
# RECORD_REF_DATE: 20160701
# RECORD_REF_TIME: 000000

the checkpoint files won't be written out at all?

JiaweiZhuang commented 5 years ago

OK commenting out RECORD worked well.

~Can also disable the final checkpoint file by~:

GIGCchem_INTERNAL_CHECKPOINT_FILE:  -gcchem_internal_checkpoint_c24.nc
JiaweiZhuang commented 5 years ago

-gcchem_internal_checkpoint_c24.nc doesn't really work.

Commenting out the line works

# GIGCchem_INTERNAL_CHECKPOINT_FILE:  gcchem_internal_checkpoint_c24.nc
lizziel commented 5 years ago

I recommend against commenting out this line since it means you won't get a restart file at all at the end of your run.

lizziel commented 5 years ago

Although if you see differently, i.e. it gives a default name, then do report that!