GEOS-ESM / GMAO_Shared

Repository for GEOS Earth System Model shared infrastructure
Apache License 2.0
3 stars 10 forks source link

Update regrid.pl for GOCART2G #228

Open mathomp4 opened 2 years ago

mathomp4 commented 2 years ago

Per @pcolarco and @mmanyin in email, there is a desire to regrid the new GOCART2G restarts. Looking at a set of them, the (possible) new files seem to be:

Now, normally we could just add these to The List™ that's in regrid.pl but it's not that simple.

I looked at these restarts with @bena-nasa and we noticed that:

The main issue is the underlying regridding code was not set up for files like these.

So, I suppose the first question for @pcolarco or @amdasilva or @christophkeller is: Do we need to worry about hemco_internal? If not, we just just always bootstrap it and focus on the 4d restarts?

mathomp4 commented 2 years ago

CC @weiyuan-jiang, @tclune, @aoloso in re the regrid refactoring effort

bena-nasa commented 2 years ago

Adding the ability for interp_restarts.x to handle these new cases an ungridded dimension + level is straightforward from a scientific perspective (just treat each index of the ungridded dimension as a 3-D variable in it's own right and do the usual horizontal and vertical regridding. However, the code has gotten to the point that to add new capabilities like this calls for a refactoring. At least splitting out the binary and netcdf into separate codes so I can focus on adding these capabilities to the netcdf restarts without breaking the binary path. Splitting this apart and adding new capabilities is a non-trivial exercise so lead time and dedicated time to do this will be necessary.

bena-nasa commented 2 years ago

@mmanyin @pcolarco It had been a long time since I looked at the code; I split interp_restarts.x into a binary and NetCDF code. That was fairly straightforward. Adding support for these 4-D variables in the NetCDF version of the program is actually more straightforward than I thought. Although this will mean a multiple repo PR for regrid.pl and the underlying regridding ...

bena-nasa commented 2 years ago

@mmanyin @pcolarco @gmao-jstassi @mathomp4 I've updated interp_restart.x to handle these new restarts (as well as refactoring and spliting interp_restart.x into a separate binary and netcdf version to make my life easier going forward). I've confirmed it works with the new gocart2g restarts that are either 2d only or 4d with the unknown_dim + level. I've made contingent PR's for these in the FV3 and fvdycore repos.

The issue now is regrid.pl itself. It needs to know about the restart names but I think we are getting to the point where it needs some more flexibility. Since each species in gocart and each instance can have it's own restart like cabc_internal_rst cabr_internal_rst caoc_internal_rst adding these explicitly in regrid.pl seems problematic. What is somone else has other instances of ca for example? It seems like regrid.pl needs to be able to do some sort of wild carding. Like instead of adding these explicitly, you had wild card like this: ca*_internal_rst and it finds any restarts of that form. Pinging Joe in this as my perl is shaky to do this.

weiyuan-jiang commented 2 years ago

Does this only affect the input name ? Is there any special option? For the future, the python regrider only cares about the names that are listed in the yaml file.

mmanyin commented 2 years ago

Sounds like good progress. Will it eventually be possible to regrid from the old GOCART 1G format to the GOCART 2G?

pcolarco commented 2 years ago

I would second @mmanyin. I think we would want to be able to regrid to GOCART2G from MERRA-2 for instance.

bena-nasa commented 2 years ago

Sounds like good progress. Will it eventually be possible to regrid from the old GOCART 1G format to the GOCART 2G?

@mmanyin Can you elaborate what that would entail or what that even means? Is it just a matter of splitting an old gocart 1g restart into separate restarts?

That's beyond the scope of what the underlying regridding code (interp_restarts.x) would handle, it just regrids what is there already.

It would have to be some other script but someone who understands this would have to write it or give me a precise recipe for what that operation would entail.

bena-nasa commented 2 years ago

I would second @mmanyin. I think we would want to be able to regrid to GOCART2G from MERRA-2 for instance. @pcolarco what does this mean? Would every field in a gocart 1g restart (merra2 or otherwise) go to a specific field in a specific gocart2 split restart? If not, I'm not sure what this operation of going from GOCART1G to GOCART2G means.

As I said, I will not support this in interp_restarts.x, that's outside the scope of that program. If is as simple as spliting the fields into separate restarts that is another programs job, it would be a trivial python script.

As far as MERRA2 This is a can of worms. Regridding directly from MERRA2 to GOCART2G is complicated by the fast that MERRA2 was binary, so there is no metadata in the file. I painstakingly figured out the order of the fields in the MERRA2 binary restarts a long time ago and wrote a converter here to convert them from binary to NetCDF using descriptor files that document the variable order in each file:

https://github.com/bena-nasa/GEOS5_restart_converter

If you want to do something with MERRA2 in this hypothetical GOCART1G to GOCART2G operation, you would need to use my tool to convert this to NetCDF then, if there is a solution to go from GOCART1G to GOCART2G with the NetCDF file.

pcolarco commented 2 years ago

I understand. A boy can dream. Splitting a netcdf legacy GOCART to GOCART2G should be straightforward with NCO or some other tool.

mmanyin commented 2 years ago

@bena-nasa I have been impressed in the past when needing to convert legacy restarts to a more recent version, that regrid.pl could identify what was missing and provide at least a bare bones set of restarts. Without really knowing what the limitations are, I was posing the general question -- can the program generate a set of G2G restarts that approximates an older G1G set? Sounds like this is not the tool for doing it.

bena-nasa commented 2 years ago

@bena-nasa I have been impressed in the past when needing to convert legacy restarts to a more recent version, that regrid.pl could identify what was missing and provide at least a bare bones set of restarts. Without really knowing what the limitations are, I was posing the general question -- can the program generate a set of G2G restarts that approximates an older G1G set? Sounds like this is not the tool for doing it.

No, regrid.pl just calls other program that regrid the restarts that are there using the boundary conditions it thinks are appropriate based on your answers to the questions; no more, no less. If you have a gocart_internal_rst in, you get one out. After I fix up regrid.pl, if you have a set of restarts from gocart 2g in you get a set out.

@mmanyin @pcolarco This does bring up point, what are all the "base" component names it need to be aware of. I'm going to try to implement a wild card feature in regrid.pl So looking in the restarts pete provided we have ca,ss,du,ni,su and these could potentially have multiple instances (in the restarts I have only ca actually does) that I will use the wildcard feature to find. What am I missing if any? I'll code it to the list above so please let me know if I need to include others.

pcolarco commented 2 years ago

@bena-nasa Your list looks complete for GOCART2G, although note it is "cabr," "cabc," and "caoc". You might anticipate an eventual refactoring of the remainder of the legacy GOCART which would split what is still left in gocart_interal_rst into subsequent things like co, co2, ch4, ... _interal_rst.

bena-nasa commented 2 years ago

@bena-nasa Your list looks complete for GOCART2G, although note it is "cabr," "cabc," and "caoc". You might anticipate an eventual refactoring of the remainder of the legacy GOCART which would split what is still left in gocart_interal_rst into subsequent things like co, co2, ch4, ... _interal_rst.

@pcolarco Oh, did misread that? I thought, cabr, cabc, caoc were separate instances of one species, but now I'm reading again, that is just brown carbon, black carbon, and organic carbon. So per species there is only one restart no matter how many instances? If so then I can just hard code the names and I was making a problem out of nothing.

pcolarco commented 2 years ago

@bena-nasa Hmm... Each instance has its own restart. But default we are running three carbonaceous instances: brown, black, and organic carbon. We need to be able to regrid each such instance. Some guidance then for how to handle multiple instances for later (i.e., so far not tried out) cases will be helpful. Does that make sense?

bena-nasa commented 2 years ago

@pcolarco @jstassi I was only asking about the instances because currently the way regrid.pl works is that is has hard coded restart names it looks for. So if the name is not in the list, it won't regrid it. So this could be a problem if someone runs a new gocart case with multiple instances for example and wants to regrid those but the script is unaware. I see two solutions possible solutions for this in regrid.pl

  1. Allow the user to specify "extra" restarts that the program is unaware of on the command line or when answering the questions when running interactively
  2. Implement some wildcard type thing where the program is given something like this ca*_internal_rst and it tries to find all the restarts that match that pattern. This would require the instance pattern to have been added and for the instances to be consistently named (I've never run gocart2g so how it works is a total mystery to me).

It sounds like for now the PR I've made handles the current uses but still needs another extension. Any thoughts on which method sounds better as an end user?

pcolarco commented 2 years ago

@bena-nasa Hey, Ben, is this functionality available now in some more recent model version for me to try out? Thanks

mathomp4 commented 2 years ago

@pcolarco I think it should be in GEOSgcm v10.21.0 for sure.

pcolarco commented 2 years ago

Thanks, Matt. I did go check out and see it in the CHANGLOG, so I’ll give it a try.


Peter Colarco NASA GSFC Code 614 NASA Goddard Space Flight Center Greenbelt, MD 20771 301.614.6382 (ph) 301.614.5903 (fax)

@.**@.> http://acd-ext.gsfc.nasa.gov/People/Colarco http://www.researcherid.com/rid/D-8637-2012

From: Matthew Thompson @.> Reply-To: GEOS-ESM/GMAO_Shared @.> Date: Thursday, January 13, 2022 at 4:23 PM To: GEOS-ESM/GMAO_Shared @.> Cc: Peter Colarco @.>, Mention @.***> Subject: [EXTERNAL] Re: [GEOS-ESM/GMAO_Shared] Update regrid.pl for GOCART2G (Issue #228)

@pcolarcohttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpcolarco&data=04%7C01%7Cpeter.r.colarco%40nasa.gov%7C384d5c0d076d4e1f481408d9d6dae82a%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637777057950609750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=GYFupCgFRv8XC%2FlQgjksCn4toJvjeiqXjp4781rEgzw%3D&reserved=0 I think it should be in GEOSgcm v10.21.0 for sure.

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGEOS-ESM%2FGMAO_Shared%2Fissues%2F228%23issuecomment-1012526119&data=04%7C01%7Cpeter.r.colarco%40nasa.gov%7C384d5c0d076d4e1f481408d9d6dae82a%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637777057950609750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=SwUuh6lSXjRW3fBbE1I69BbIm2LgR4gJ%2FMBNAT2YejA%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANW73YARKTWE3AI2GMKBZ6LUV47D7ANCNFSM5HI7R2DQ&data=04%7C01%7Cpeter.r.colarco%40nasa.gov%7C384d5c0d076d4e1f481408d9d6dae82a%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637777057950609750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DpoQuJ8CFnDykS4zhy0azsRTcbdTcLlEpOhcENZlTiQ%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cpeter.r.colarco%40nasa.gov%7C384d5c0d076d4e1f481408d9d6dae82a%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637777057950609750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=SBkW7lMnsrY5TxL15rtpNmQpbqrjXuDzM87OonR%2BjlI%3D&reserved=0 or Androidhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cpeter.r.colarco%40nasa.gov%7C384d5c0d076d4e1f481408d9d6dae82a%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637777057950609750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7gD5uyZMXBg1J1xkkIG1G7%2Bhlz2hGml2yGXU9%2F8PUIQ%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>