NOAA-EMC / AQM

GNU General Public License v3.0
3 stars 15 forks source link

Long running time of NEXUS for the Online-CMAQ (or UFS-AQM) over the large domain #12

Closed JianpingHuang-NOAA closed 1 year ago

JianpingHuang-NOAA commented 1 year ago

It takes 45 minutes to generate s 72-hr NEXUS_Expt_ugly.nc file, 5 minutes for a NEXUS_Expt_pretty.nc file, and 10 minutes for the NEXUS_Expt.nc file. The total running time of NEXUS for 3 days and large domain is about an hour. We need to reduce the running time to less than 30 minutes.

Jianping

bbakernoaa commented 1 year ago

@JianpingHuang-NOAA I have been testing this to see what our options are. Right now I'm performing some tests where I split MEGAN running from the rest of the anthropogenic processing. Then we create multiple jobs for MEGAN/NEXUS to simulate only 24 hours at a time. But this allows us to simulate multiple days simultaneously. It does however create some error in the emissions (typically less than a few percent) but I want to see its impact on ozone in particular.

@rmontuoro

bbakernoaa commented 1 year ago

I have been working on this. There is no viable solution in the short term to increase the speed directly running 72 hours with nexus for biogenics (note that for anthropogenic only the NEXUS runs very fast and scales well with CPUs).

Creating multiple run days is possible. Preliminary tests show that it does incur some difference vs a continuous run however the errors are in the tropical regions in the extreme lower part of the domain. In CONUS for example mean absolute percent error is less than 3 percent when compared to a continuous run. If EMC agrees then all that needs to be done is to create new tasks within the workflow to handle this. I have already created the configuration files splitting the anthropogenic data and biogenic data into seperate configuration files. You can see that here in the nexus branch feature/megan_move https://github.com/noaa-oar-arl/NEXUS/tree/feature/megan_move/config

So to summarize the needed workflow updates

1) create new job (for each forecast day (24 hours) to run the megan emissions 2) Merge the 3 MEGAN days into a single day job 3) merge the concatenated megan days with the anthropogenic data

We also need to create a conversion from GFS data from the previous cycle to NEXUS inputs (basically just conversion into the needed variables and data format).

@rmontuoro @JianpingHuang-NOAA

bbakernoaa commented 1 year ago

Here are some plots to explain the differences. There are two simulations here: 1) a 48 hours simulation from the original date 2) a 24 hours simulation started at 24 hours

ALL pixels

image

Maximum ISOPRENE emission from day 2 of the 48 hours simulation

image

Maximum difference of isoprene

image
JianpingHuang-NOAA commented 1 year ago

Does Fig.2 represent the difference of isoprene emissions between two runs? What is the percentage difference ?

Thanks,

On Thu, Oct 13, 2022 at 2:20 PM Barry Baker @.***> wrote:

Here are some plots to explain the differences. There are two simulations here: 1) a 48 hours simulation from the original date 2) a 24 hours simulation started at 24 hours

[image: image] https://user-images.githubusercontent.com/22104759/195664839-54bcfdea-6671-4c87-9a47-ab74300a0578.png

[image: image] https://user-images.githubusercontent.com/22104759/195675419-3f12e43a-40ac-48cf-9423-72a91ff9b4b2.png

[image: image] https://user-images.githubusercontent.com/22104759/195675327-6534e29b-9429-469d-8cd1-e6a86f4f6866.png

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/AQM/issues/12#issuecomment-1278001963, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2PIYJIIFSBWXQDUKBVWLWDBHFFANCNFSM6AAAAAAQQKUU74 . You are receiving this because you were mentioned.Message ID: @.***>

bbakernoaa commented 1 year ago

figure 3 represents the differences. Figure 2 is the maximum base result.

Barry Baker

National Oceanic and Atmospheric Administration Air Resources Laboratory Physical Research Scientist Chemical Modeling and Emissions Group Leader NCWCP, R/ARL, Rm. 4204 5830 University Research Court College Park, Maryland 20740 Phone: ‪(301) 683-1395‬

On Thu, Oct 13, 2022 at 2:28 PM JianpingHuang-NOAA @.***> wrote:

Does Fig.2 represent the difference of isoprene emissions between two runs? What is the percentage difference ?

Thanks,

On Thu, Oct 13, 2022 at 2:20 PM Barry Baker @.***> wrote:

Here are some plots to explain the differences. There are two simulations here: 1) a 48 hours simulation from the original date 2) a 24 hours simulation started at 24 hours

[image: image] < https://user-images.githubusercontent.com/22104759/195664839-54bcfdea-6671-4c87-9a47-ab74300a0578.png

[image: image] < https://user-images.githubusercontent.com/22104759/195675419-3f12e43a-40ac-48cf-9423-72a91ff9b4b2.png

[image: image] < https://user-images.githubusercontent.com/22104759/195675327-6534e29b-9429-469d-8cd1-e6a86f4f6866.png

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/AQM/issues/12#issuecomment-1278001963, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ANA2PIYJIIFSBWXQDUKBVWLWDBHFFANCNFSM6AAAAAAQQKUU74

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/AQM/issues/12#issuecomment-1278017918, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVNZ3CAS5D4YARRXO62TWDBIGHANCNFSM6AAAAAAQQKUU74 . You are receiving this because you commented.Message ID: @.***>

HaixiaLiu-NOAA commented 1 year ago

@bbakernoaa thank you for the plots. @JianpingHuang-NOAA @rmontuoro are we okay with the differences noticed by @bbakernoaa in emission rate btw running one 72-hr job and separating into three 24-hr jobs?

@bbakernoaa have you made updates to the workflow you listed above and the data conversion? Thank you.

HaixiaLiu-NOAA commented 1 year ago

@bbakernoaa where period of your experiment covered for the plots you generated above? Will the differences seen in the tropical region be seen for the summer time as well? Thank you.

bbakernoaa commented 1 year ago

This was a summer case. This is where we will see the largest difference in the domain. The tropical area will not change much from season to season.

Barry Baker

National Oceanic and Atmospheric Administration Air Resources Laboratory Physical Research Scientist Chemical Modeling and Emissions Group Leader NCWCP, R/ARL, Rm. 4204 5830 University Research Court College Park, Maryland 20740 Phone: ‪(301) 683-1395‬

On Fri, Oct 21, 2022 at 2:47 PM HaixiaLiu @.***> wrote:

@bbakernoaa https://github.com/bbakernoaa where period of your experiment covered for the plots you generated above? Will the differences seen in the tropical region be seen for the summer time as well? Thank you.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/AQM/issues/12#issuecomment-1287322642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVN736SEVEBJIWSLRH43WELQK5ANCNFSM6AAAAAAQQKUU74 . You are receiving this because you were mentioned.Message ID: @.***>

HaixiaLiu-NOAA commented 1 year ago

Thank you, @bbakernoaa. Jianping would like to test the changes from ARL on this issue. Could you please work with Jianping to test your changes in the online-CMAQ? Thank you very much.

bbakernoaa commented 1 year ago

@HaixiaLiu-NOAA As I mentioned in above. The changes needed are in a branch. There needs to be workflow tasks created to make this happen. @chan-hoo Could you help with this?

chan-hoo commented 1 year ago

@bbakernoaa, what can I do for you to resolve this issue?

bbakernoaa commented 1 year ago

@chan-hoo can we setup a meeting to discuss this. It may be easier than typing everything here at the moment.

chan-hoo commented 1 year ago

Sure. I'll check your calendar and set up a meeting.

chan-hoo commented 1 year ago

@bbakernoaa, can you merge your 'feature/megan_move' branch into the 'develop' branch?

JianpingHuang-NOAA commented 1 year ago

I checked out the latest workflow (6990782 with NEXUS (6990782 and still saw very little improvement of NEXUS runtime. It still takes 46-47 minutes of running split job and about 10-11 minute for NEXUS post job. The total runtime of NEXUS is about 56-58 minutes. The issue has been lasting for more than two months. We need to get it resolved as soon as possible as possible. Otherwise it will cause a delay of completing 10-month retro runs. Thanks !

bbakernoaa commented 1 year ago

@JianpingHuang-NOAA I wish that @chan-hoo would not have committed directly to the online-cmaq branch with the update until it was completed. It has caused a lot of confusion.

What is happening is he added the capability to split the nexus_emission workflow task into multiple tasks controlled by the NEXUS_NUM_SPLIT variable in the config.yaml. This will allow us to split the nexus steps into N sub-tasks to meet the operational requirement of nexus. However this was not completed yet as a new capability is now needed to be able to recombine all the split nexus tasks. I've been working on this and it should be ready shortly.

Here is the draft PR https://github.com/ufs-community/ufs-srweather-app/pull/494

BUT this is not quite ready to test out as you will need to point to a new nexus hash etc. my goal is to complete this tomorrow.

HaixiaLiu-NOAA commented 1 year ago

@bbakernoaa Would you please update the status of this issue? Thank you very much!

bbakernoaa commented 1 year ago

@HaixiaLiu-NOAA Please see https://github.com/ufs-community/ufs-srweather-app/pull/494

chan-hoo commented 1 year ago

@JianpingHuang-NOAA, @HaixiaLiu-NOAA, the above PR has been merged into the 'online-cmaq' branch. You can adjust the number of splits by the parameter NUM_SPLIT_NEXUS in config.yaml.

JianpingHuang-NOAA commented 1 year ago

Thanks !

On Wed, Nov 30, 2022 at 9:01 AM Chan-Hoo.Jeon-NOAA @.***> wrote:

@JianpingHuang-NOAA https://github.com/JianpingHuang-NOAA, @HaixiaLiu-NOAA https://github.com/HaixiaLiu-NOAA, the above PR has been merged into the 'online-cmaq' branch. You can adjust the number of splits by the parameter NUM_SPLIT_NEXUS in config.yaml.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/AQM/issues/12#issuecomment-1332194750, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2PIY6X7JIBMTDLEGDG2DWK5M2LANCNFSM6AAAAAAQQKUU74 . You are receiving this because you were mentioned.Message ID: @.***>

JianpingHuang-NOAA commented 1 year ago

@chan-hoo I am testing the new LBCs with the latest workflow and the workflow stopped at the running the NEXUS post jobs. The log.launch_FV3LAM_wflow (/lfs/h2/emc/physics/noscrub/jianping.huang/nwdev/packages/expt_dirs/ufs_na_rt_v20/201908) shows: 201908061200 nexus_emission_00 20235357 SUCCEEDED 0 1 630.0 201908061200 nexus_emission_01 20235358 SUCCEEDED 0 1 551.0 201908061200 nexus_emission_02 20235365 SUCCEEDED 0 1 514.0 201908061200 nexus_post_split 20236066 RUNNING - 0 0.0 201908061200 fire_emission 20235359 SUCCEEDED 0 1 33.0 201908061200 point_source 20235360 QUEUED - 0 0.0 201908061200 get_extrn_ics 20235361 QUEUED - 0 0.0 201908061200 get_extrn_lbcs 20235362 QUEUED - 0 0.0

In fact, the nexus_post_split job was completed (see /lfs/h2/emc/ptmp/jianping.huang/emc.para/output/20190801 > vim nexus_post_split_2019080112.id_1669954224.log).

I am running the retro for 30-hr for each day at 12z but I still use "NUM_SPLIT_NEXUS: 3" in config.yaml file. Is this the reason that workflow or NEXUS job can not handle it well? @bbakernoaa

Thanks,

Jianping

chan-hoo commented 1 year ago

@JianpingHuang-NOAA, in your log file, 'nexus_post_split' didn't fail but 'aqm_lbcs' failed: 201908011200 nexus_post_split 20236918 SUCCEEDED 0 1 318.0 201908011200 aqm_lbcs 20240061 DEAD 1 2 49.0

bbakernoaa commented 1 year ago

@JianpingHuang-NOAA It should handle it fine. I didn't have any issues in testing. I took a quick look and it looks like you deleted the directory /lfs/h2/emc/ptmp/jianping.huang/emc.para/com/aqm/v7.0/aqm.v7.0.c19.20190801 this is where the final product get puts for input.

I also looked at the intermediate files and they look ok

HaixiaLiu-NOAA commented 1 year ago

@JianpingHuang-NOAA @bbakernoaa Given that the PR #494 has been merged, should this issue be closed? Start a new issue if necessary.

bbakernoaa commented 1 year ago

@HaixiaLiu-NOAA As chan-hoo showed the nexus_post_split didn't fail. I think something happened with @JianpingHuang-NOAA directory where final inputs are stored. I suggest that he reruns the case from the start as all inputs seem to be missing