PCMDI / obs4MIPs-cmor-tables

JSON Tables for CMOR3 to create obs4MIPs datasets
Other
6 stars 3 forks source link

RC (Register Contents) for ARMBE #304

Open chengzhuzhang opened 11 months ago

chengzhuzhang commented 11 months ago

The following are required registered content (with example content for each item in bold). Please replace the example text below with your information to the right of the equal sign (DO NOT MAKE ANY CHANGES TO THE LEFT HAND SIDE OF THE EQUAL SIGN): 1) source_id['source_id'][key]['source_name'] = 'ARMBE' 2) source_id['source_id'][key]['release_year'] = '2023?' 3) source_id['source_id'][key]['source_description'] = 'DOE ARM Best Estimate Data Products for Atmosphere and Cloud properties' 4) source_id['source_id'][key]['source_version_number'] = '?' 5) source_id['source_id'][key]['institution_id'] = 'DOE-ARM' 6) source_id['source_id'][key]['region'] = 'point?' #option not available from obs4MIPs docs 7) source_id['source_id'][key]['source_type'] = 'point_insitu' #option not available from obs4MIPs docs 8) A list of CMIP variable_ids that the above information refers to. In most cases it will only be for one variable_id. If it is for more than one, please make sure your source_description is sufficiently general to apply to all relevant variable_ids.


See note 14 and Appendix II of the obs4MIPs data specifications (https://goo.gl/jVZsQl) for more information regarding registered content, and feel free to ask questions!

chengzhuzhang commented 11 months ago

@gleckler1 hello Peter, I'm trying to fill in the issue, and asked Shaocheng and Cheng @EmmyChengTao for a review. For items 6 and 7 that we don't have a clear idea, since point in-situ measurements were not described in the obs4MIPs docs. Please advice..

gleckler1 commented 11 months ago

@chengzhuzhang @EmmyChengTao Many thanks for the entry. Yes, #6 and #7 are going to take some thinking - that was expected. We should look at some metadata from the relevant point data from CFMIP. We'll also be talking with Karl T. about this. We'll have to come up with a version number as that is key for obs4MIPs. If its not provided, we'll construct one out of the date the data was created. I don't know if the ARMBE gets periodically updated with newer data, but if it does we'll factor that into the construction of the "source_id" ... we need to come of with a version number before dealing with #6 and #7. I do see the following in the netCDF files:

dod_version = "armbeatm-c1-1.8... we could use this, to create something like: ARB-BE-c1-1-8 ('.' replaced with '-' because '.' can't be used in Source_id).

chengzhuzhang commented 10 months ago

@gleckler1 Hi Peter, thanks for your future guidance. Sorry for the delayed response. I did some research on the "dod_version", ARM has a whole documentation about convention as well. For finalizing the version number in our case, since the source_name already as "ARMBE", we can perhaps just use "atm-c1-1_8" as 'source_version_number'. One complication I want to bring up is that, there are subsets of ARMBE datasets: ATM (atmosphere), CLDRAD (cloud and radiation), LND (land) groups. In this case can we use "atm-c1-1_8", "cldrad-c1-1_8", etc. to distinguish them? Or should we register with more datasets?

In the mean time, @EmmyChengTao helped to prepare the variable list, Variables_ARMBE_CMIP_CT_ZY.xlsx, which can be refereed to later.

durack1 commented 10 months ago

@chengzhuzhang thanks for pushing on this. The point source data is indeed a new category to consider, we have time point (temporal index) that has been considered before, but the single location is not something that has been captured before. The region is also not well accounted for, as obs4MIPs-cmor-tables/obs4MIPs_region.json has dealt with regions defined by CF, which doesn't allow for great plains ARM data - this will require some thinking

taylor13 commented 10 months ago

Perhaps a single location can be treated similarly to the CFMIP "site" data (which was ~100 locations). The site dimension would be 1 if only a single dimension.

chengzhuzhang commented 10 months ago

Thanks for chiming in @durack1 and @taylor13 . Peter has suggested that to model after CFMIP "site" data might be the way to go. And I'm pretty sure most of ARM sites are defined in the CFsite list...

gleckler1 commented 10 months ago

@chengzhuzhang @taylor13 @durack1. Since the goal of obs4MIPs it to be technically aligned with CMIP, I still think following the CFMIP site example is defensible. However, if there is now (retrospectively) a much better way to describe site data (e.g., CFMIP will modify it for CMIP7) the path forward is less clear. But that (defining changes) could take years so again a defensible way forward (for now) may be to latch on to what was done in CMIP6.

Regarding the version number, which we do need to resolve to get started, obs4MIPs does not strive to enforce a uniform template for version numbers, i.e., how the versions are defined can be dataset dependent. So atm-c1-1_8" and "cldrad-c1-1_8" are acceptable choices. But the aspiration is that for a given version we will be able to point the files from which they were constructed.

gleckler1 commented 10 months ago

@chengzhuzhang. When you have a chance, can you check to see if you have permissions to create a branch of this repo?

chengzhuzhang commented 10 months ago

thank you @gleckler1 , I cloned the repo and created a branch locally, but it doesn't seem that I have permission to push the branch to this repo. I assume that I may need to be added to write to this repo.

gleckler1 commented 10 months ago

@chengzhuzhang I thought that might be the case - thanks for trying. We looking for the best path to relax pushing branches. I'll get back to you soon on this.
Let me know if you want to use atm-c1-1_8" and cldrad-c1-1_8 as ARMBE version numbers and if yes, please specify which variables (appropriate for CMIP comparision) each one includes. Each source_id can provide multiple variables

durack1 commented 10 months ago

I would suggest against using the "_" in the source_id definition, as this is the delimiter for the rest of the filename components per (ods2.1 link). So rather than "atm-c1-1_8" does "atm-c1-1-8" work?

gleckler1 commented 10 months ago

@durack1 agreed with swap for "_"

I see ARMBE variable list now.

chengzhuzhang commented 10 months ago

Thanks! I will use "_" instead, and will extract the variables from the spreadsheet to include in my branch.

durack1 commented 10 months ago

Thanks! I will use "_" instead, and will extract the variables from the spreadsheet to include in my branch.

Sorry to be clear, I was suggesting using "-"/hyphen rather than "_"/underscore in the source_id

chengzhuzhang commented 10 months ago

Oh, i misspoken. I will use hypen instead. Thank you for confirming.

chengzhuzhang commented 10 months ago

Updates equivalent CMIP variable names here: atm: uas, vas, ps, pr, hfss, hfls, tas, hfss, hfls, ta, ua, va, hur, wap, cldrad: cl, clt, rsds, rsus, rlds, rlus, prw, clwvi, clivi, rlut, rsdt, rsut, rsdt, rsdscs

It seems region is not one of the key words to be listed in the file name, we need to consider to squeeze region/ARM site name to one of the fields that are used to construct the file name: __[_].nc

taylor13 commented 10 months ago

It hasn't yet been decided on how to include in a CMIP7 filename the sampling-Interval and data-Region, but I've proposed a template in section 6 (pg. 14) of this document . For CMIP7, "region" is almost always "global" (glb), so it seems less than ideal that this should be indicated in every file, but for easy construction and parsing of file names, I think we need to include it. Note in the referenced document the differences in filenames between CMIP6 and in the CMIP7 proposal. A major change is that "outname" + "table_id" get replaced by the branded-variable name (or possibly by "outname" + "branding suffix"; again no final decision has been made).

Note also that CORDEX must also identify "region" in its file names.

taylor13 commented 10 months ago

Another option to consider: the grid_label needs rethinking, so it might be possible to specify the region of a single site by a special text string put there (e.g., instead of gn, gr1, gr2, etc., you might have s-sgp (site: Southern Great Plains)) I vaguely recall there was some other grid format being considered for obs4MIPs, so we need to think about this.

gleckler1 commented 10 months ago

@chengzhuzhang I'm trying to get over some technical setbacks and will get back to you soon regarding next steps for prototyping processing of ARBE data. I think it will be helpful for us to have the processing set up so that we can try out different options and think about them, including perhaps getting Sasha's help to test ESGF publication at some point.

chengzhuzhang commented 10 months ago

@taylor13 Thank you. To use grid_label for including site information makes good sense to me... @gleckler1 Sounds good! I will revise the initial input as soon as I hear back. Thank you for working on this!

gleckler1 commented 9 months ago

@chengzhuzhang Sorry for the delay. Your permission to write is now pending, so soon you should be able to upload a test branch. I'll available to talk about what to do with the branch once its in the repo. ARMBE metadata now in a PR. Soon we'll be able to move onto thinking about how to incorporate insitu data. This will involve the , , the possibility of a spatial coordinate, etc.

chengzhuzhang commented 9 months ago

Hi @gleckler1 thank you for working on this PR! The initial entry for ARMBE looks great. Yes! I think next we will need to sort out how to incorporate region related specs.. Let me know if there are anything I can do for testing.. P.S. I'm officially a collaborator now on this repo!

chengzhuzhang commented 7 months ago

hello Peter @gleckler1 I made some minor updates to the json file that describes ARMBE data, in the Pull Request here: https://github.com/PCMDI/obs4MIPs-cmor-tables/pull/321 Though there still some parameters that are specific to in-situ data we need to iron out. For instance, where to list the ARM site name, should we include in source_id or other field. At this point, I'm sure I won't be able to test my run script and json file with CMOR. Not sure what is the next step to take here..

gleckler1 commented 7 months ago

Hi Jill,

I spoke with Karl and Paul Durack about this at length yesterday. I have a meeting at 11AM, but do you want to have a quick chat now? If not, I have some openings later today. P

From: Jill Chengzhu Zhang @.> Date: Thursday, February 8, 2024 at 10:30 AM To: PCMDI/obs4MIPs-cmor-tables @.> Cc: Gleckler, Peter John @.>, Mention @.> Subject: Re: [PCMDI/obs4MIPs-cmor-tables] RC (Register Contents) for ARMBE (Issue #304)

hello Peter @gleckler1https://urldefense.us/v3/__https:/github.com/gleckler1__;!!G2kpM7uM-TzIFchu!wI0aTgUY5E4jN33-13lN4YAJVsnmCrJ3UjoidvpC3wp475TY6Ni2V95wCi8B67-aXIt7pjczJ6X3MPDPqkU2jk-Uu68$ I made some minor updates to the json file that describes ARMBE data, in the Pull Request here: #321https://urldefense.us/v3/__https:/github.com/PCMDI/obs4MIPs-cmor-tables/pull/321__;!!G2kpM7uM-TzIFchu!wI0aTgUY5E4jN33-13lN4YAJVsnmCrJ3UjoidvpC3wp475TY6Ni2V95wCi8B67-aXIt7pjczJ6X3MPDPqkU2SlMZofc$ Though there still some parameters that are specific to in-situ data we need to iron out. For instance, where to list the ARM site name, should we include in source_id or other field. At this point, I'm sure I won't be able to test my run script and json file with CMOR. Not sure what is the next step to take here..

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/PCMDI/obs4MIPs-cmor-tables/issues/304*issuecomment-1934710748__;Iw!!G2kpM7uM-TzIFchu!wI0aTgUY5E4jN33-13lN4YAJVsnmCrJ3UjoidvpC3wp475TY6Ni2V95wCi8B67-aXIt7pjczJ6X3MPDPqkU2qoAGJAE$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/ABCXVLLEQDWAWTZ6PUXEG5DYSUKVRAVCNFSM6AAAAAA6H6AHJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZUG4YTANZUHA__;!!G2kpM7uM-TzIFchu!wI0aTgUY5E4jN33-13lN4YAJVsnmCrJ3UjoidvpC3wp475TY6Ni2V95wCi8B67-aXIt7pjczJ6X3MPDPqkU2J6Uvx4M$. You are receiving this because you were mentioned.Message ID: @.***>

chengzhuzhang commented 7 months ago

@gleckler1 thanks for the meeting! As discussed, I'm providing the initial list of Cfsites information that we also provide data in ARMBE.

31,  166.9  ,   -0.5       ,'166.9E 0.5S    Nauru ARM   (CPT)'
33,  147.4  ,   -2.1       ,'147.4E 2.1S     Manus ARM  (CPT)'
36,  130.9  ,  -12.4       ,'130.9E 12.4S    Darwin ARM   (CPT)'
37,  -97.5  ,   36.4       ,'97.5W 36.4N    Oklahoma ARM   (CPT)'
38, -156.6  ,   71.3       ,'156.6W 71.3N    Barrow ARM   (CPT)
47,  -28    ,  39          ,'Graciosa in the Azores (28W 39N) 2009 AMF deployment (Chris Bretherton)'

Attached also a copy of the cmip6 cfsites locations information. Among the complete list, there are additional sites that ARMBE provide data, but we can identify at a later time. cmip6-cfsites-locations-extended copy.txt

Below is the ARM site acronyms, which were embedded in the ARMBE data stream name, e.g. sgparmbeatmC1.c1.20200101.003000.nc.

    "twpc1": [-2.1, 147.4,  "147.4E 2.1S Manus ARM"],
    "twpc2": [-0.5, 166.9, "166.9E 0.5S Nauru ARM"],
    "twpc3": [-12.4, 130.9,  "130.9E 12.4S Darwin ARM"],
    "sgpc1": [36.4, -97.5, "97.5W 36.4N Oklahoma ARM"],
    "nsac1": [71.3, -156.6, "156.6W 71.3N Barrow ARM"],
    "enac1": [39.1, -28.0,  "28.0E 39.1N Graciosa Island ARM"],
gleckler1 commented 7 months ago

@chengzhuzhang ok that last bit should help us prototype! Thank you. I did not mention, we'll try using "gn" in the filename... I hope to try that out in the days ahead... See "grid label" in Table1 and filename template near the end of this document: https://pcmdi.github.io/obs4MIPs/docs/ODSv2.5-DRAFT.pdf

chengzhuzhang commented 7 months ago

@gleckler1 thanks! Please don't hesitate to let me know if anything I can help to work out this prototype!

chengzhuzhang commented 1 month ago

@gleckler1 following up our discussion yesterday, I made some code change (which is based on instruction from @taylor13 at https://github.com/PCMDI/cmor/issues/728). The python script is now working with single lat/lon included in the file (code change in https://github.com/PCMDI/obs4MIPs-cmor-tables/pull/321/commits/d6c7b9514b0cffe85c38c7492b5618349b7f5dab). The metadata is shown as follows:

netcdf pr_1hr_ARMBE-atm-c1-1-8_DOE-ARM_site_201801010030-201812312330 {
dimensions:
    time = UNLIMITED ; // (8760 currently)
    lat = 1 ;
    lon = 1 ;
    bnds = 2 ;
variables:
    double time(time) ;
        time:bounds = "time_bnds" ;
        time:units = "days since 2018-01-01 00:00:00 0:00" ;
        time:calendar = "gregorian" ;
        time:axis = "T" ;
        time:long_name = "time" ;
        time:standard_name = "time" ;
    double time_bnds(time, bnds) ;
    double lat(lat) ;
        lat:units = "degrees_north" ;
        lat:axis = "Y" ;
        lat:long_name = "Latitude" ;
        lat:standard_name = "latitude" ;
    double lon(lon) ;
        lon:units = "degrees_east" ;
        lon:axis = "X" ;
        lon:long_name = "Longitude" ;
        lon:standard_name = "longitude" ;
    float pr(time, lat, lon) ;
        pr:standard_name = "precipitation_flux" ;
        pr:long_name = "Precipitation" ;
        pr:comment = "includes both liquid and solid phases" ;
        pr:units = "kg m-2 s-1" ;
        pr:cell_methods = "area: time: mean" ;
        pr:cell_measures = "area: areacella" ;
        pr:missing_value = 1.e+20f ;
        pr:_FillValue = 1.e+20f ;
        pr:valid_min = 2.f ;
        pr:valid_max = 3.f ;
chengzhuzhang commented 1 month ago

The next step is to work on "region". If we chose to include the lat/lon value, I think region is less important. Yesterday, we also realized, it is also not a search facet currently supported by ESGF MetaGrid, and it seems to just serve as a global attribute. My proposal is to have the site name as part of the source_id of ARM data, the rational is that for these site data, their available time periods, and the update frequencies are site specific. With each site has one data stream/source_id, it also makes easier to maintain the datasets.

gleckler1 commented 1 month ago

@chengzhuzhang Great progress! We definitely want to consider inclusion of coordinates in source_id as an option but there are lots of moving parts that need to be considered. "region" will be an important option for obs4MIPs moving foward. As a next step, lets consider a few test-case site specific source_ids. Here is a first guess:

'ARMBE-SGP-atm-c1-1-8' and 'ARMBE-SGP(36°36′26"N, 97°29′15″W)-atm-c1-1-8' # looks messy!

Maybe you can improve on these names or do you think these are ok? It's just a test run so we don't have to get it perfect yet. I think its reasonable consider two options, that 1) identifies the location only by the acronym SGP and 2) explicitly includes coordinates. Once we chosen a few test-case source_ids I'll run the script to add them so that CMOR recognizes them.

gleckler1 commented 1 month ago

@chengzhuzhang btw I've confirmed your option #2 runs.

chengzhuzhang commented 1 month ago

@gleckler1 Thank you for testing option 2! The proposed names look good to me. And my preference would be the first ARMBE-SGP-atm-c1-1-8, nice and clean! Since source_id is baked into the file_name, I doubted the special characters with the coordinates can be supported as is in the file_name..