NCPP / ocgis

OpenClimateGIS is a set of geoprocessing and calculation tools for CF-compliant climate datasets.
Other
70 stars 19 forks source link

Add more metadata to the output file... #506

Open ekluzek opened 4 years ago

ekluzek commented 4 years ago

The mapping files from OCGIS are pretty bare bones and need more meta-data added to them.

I'd like to see the same sort of metadata that are on the ESMF RegridWeights mapping files. Such as...

// global attributes: :title = "ESMF Offline Regridding Weight Generator" ; :normalization = "destarea" ; :map_method = "Conservative remapping" ; :ESMF_regrid_method = "First-order Conservative" ; :conventions = "NCAR-CSM" ; :domain_a = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_0.25x0.25_MODIS_c170321.nc" ; :domain_b = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/0.9x1.25_c110307.nc" ; :grid_file_src = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_0.25x0.25_MODIS_c170321.nc" ; :grid_file_dst = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/0.9x1.25_c110307.nc" ; :CVS_revision = "6.3.0r" ;

We also add the hostname run on, the user-name of the user doing it, and the "history", so the date done and the exact command that was launched.

bekozi commented 4 years ago

Some of this functionality maybe needs to go into ESMF's concurrent weight file write routine. Regardless, this should be a straightforward improvement.

bekozi commented 4 years ago

@rokuingh Added filemode option to ESMPy (https://github.com/esmf-org/esmf/tree/ESMPy-filemode). I've started integrating this into the chunked regridding.

Ping @slevisconsulting

bekozi commented 4 years ago

Feature branch: https://github.com/NCPP/ocgis/tree/i506-esmpy-filemode

bekozi commented 4 years ago

This is implemented but will require a beta snapshot of ESMF to work. The weight file output is equivalent to the standard ESMF weight file with auxiliary variables and attributes.

@ekluzek I wanted to follow-up on:

We also add the hostname run on, the user-name of the user doing it, and the "history", so the date done and the exact command that was launched.

We could add arbitrary attributes to the output weight file using a JSON string as an argument to ocli. Is this something that sounds appealing?

ekluzek commented 4 years ago

@bekozi hmmm. I'm not sure there's much of a reason to add an arbitrary string as a global attribute to the file, as I can use NCO to add it easily afterwards. But, what about adding those specific things: hostname, user-name, and date? The CF convention has "history" as a standard global attribute, it typically is the date/time that the command line for the creation program/script that was run. username and hostname could be additional things tacked on as well. I think all of these are pretty standard things that are useful to see and document the file and how it was created. I've found this kind of documentation to be extremely helpful when you go back later and try to figure out how a file was created. This is the kind of thing that I continually have to do over and over again, some documentation in the global attributes make it easy -- but otherwise it can be difficult to impossible to do.

bekozi commented 4 years ago

@ekluzek Got it. Let me cook something up and get back to you with an example.

bekozi commented 4 years ago

@ekluzek I added the three attributes. They look like:

created_by_user     :: 'benkoziol'
created_on_hostname :: 'system76-laptop'
created_at_datetime :: '2020-03-30 09:42:24.216163'

The names/values can be adjusted fairly easily. I think the user and hostname retrieval are pretty portable, but it may take some fine tuning on some platforms.

ekluzek commented 4 years ago

@bekozi that's great, that gives me the kind of metadata that I've found to be really useful. One other thing I've found useful is the version of the program or script that created the file. For something checked out under git, I store the output of "git describe".

And just to point you to the CF conventions for attributes. I don't know if you are trying to follow any specific conventions -- but that's a good one to follow. The history attribute on it is useful as it both adds the creation date, as well as the program that produced it. And then if someone manipulates it again that manipulation will be added to the history. So history is a good attribute to follow the convention for.

Here's the CF conventions...

http://cfconventions.org/cf-conventions/cf-conventions.html#attribute-appendix

bekozi commented 4 years ago

@ekluzek In general, these weight files do not follow a convention (I guess it's a SCRIP weight file but no real convention around that). I can add the CF history attribute to the output weight files no problem. Is this where you'd prefer to have the creation information as well? I guess I'm asking if you'd prefer to have the "created" attributes in addition to the "history" attribute.

ekluzek commented 4 years ago

The creation date is best off in the history attribute, because you can then figure out any follow on history. If you have creation_date as a separate attribute, it's not clear to what operation it applies to when there is a string of manipulations on the file. But, the user and hostname don't necessarily lend themselves to easily go into "history". So I've put them as separate attributes and then just need to know that it goes with the original operation on the file, rather than any subsequent ones.

bekozi commented 4 years ago

Makes sense to me. I'll take this opportunity to format the ocli command line arguments into the history string. Will be back with an example for review.

bekozi commented 4 years ago

@ekluzek How does this look?

// global attributes:
        :created_by_user = "benkoziol" ;
        :created_on_hostname = "system76-laptop" ;
        :history = "2020-04-01 10:02:49.028146: Created by ocgis (v2.1.1) and ESMF (v8.1.0 beta snapshot) with CLI command: ocli chunked-rwg --weightfilemode BASIC --loglvl INFO --no_verbose False --spatial_subset_path /tmp/ocgis_test_p5i8p9n3/spatial_subset.nc --no_ignore_degenerate False --wd /tmp/ocgis_test_p5i8p9n3/chunks --esmf_regrid_method BILINEAR --esmf_dst_type GRIDSPEC --esmf_src_type GRIDSPEC --weight /tmp/ocgis_test_p5i8p9n3/weights.nc --destination /tmp/ocgis_test_p5i8p9n3/destination.nc --source /tmp/ocgis_test_p5i8p9n3/source.nc" 
ekluzek commented 4 years ago

Perfect. Works for me.

bekozi commented 4 years ago

Great! I'll work on getting this and the esmf branch merged.

bekozi commented 4 years ago

For reference, the associated esmpy PR is: https://github.com/esmf-org/esmf/pull/4

bekozi commented 3 years ago

@slevisconsulting - I'm reopening this to address the issue related to writing auxiliary coordinate variables for high resolution grids. I'm planning to enable the appropriate flags in an ESMF branch to confirm this will fix the problem. I'll then add the appropriate parameters to ESMPy and ocgis.

slevis-lmwg commented 3 years ago

Thank you @bekozi

For my benefit, I'm linking this issue to my PR here.

bekozi commented 3 years ago

@rokuingh is adding the 64-bit offset flag to ESMPy. He also identified an issue where the file types were not passed to ESMF routines correctly. I'll bring the offset flag into ocli once it's ready in ESMPy. I tested statically setting the flags for the higher resolution UGRID->SCRIP case using a reproducer from @slevisconsulting, and the operation works with auxiliary coordinates.

slevis-lmwg commented 3 years ago

New concern relating to auxiliary data in the context of CTSM's surface data generation (with a piece of very good news):

Running ./mksurfdata_map to generate a surface dataset appears to work now! However, the corresponding log file shows zeros for all variable areas at both the input (raw data) resolutions as well as the output (surface data) resolution. This is because auxiliary variables areaa and areab contain all zeros. This makes CTSM's error-checking unusable.

bekozi commented 3 years ago

@rokuingh ESMPy's auxiliary variable support will need to be modified to include areas when writing weight files. Is this possible within the current implementation of WITHAUX?

rokuingh commented 3 years ago

I am no expert on ESMF IO, but it looks like the routine that is responsible for writing the weight files does indeed handle the areas (and fractions). The routine consists of a couple thousand lines of Fortran. A quick pass through the code seems to imply that areas are only written when using the conservative method.

slevis-lmwg commented 3 years ago

it looks like the routine that is responsible for writing the weight files does indeed handle the areas (and fractions). The routine consists of a couple thousand lines of Fortran. A quick pass through the code seems to imply that areas are only written when using the conservative method.

Thank you, @rokuingh @bekozi if by "conservative method" we mean this option --esmf_regrid_method CONSERVE, then this is what we're doing. So the problem remains that the area variables areaa and areab are all zeros in all the weight files that I've looked at.

rokuingh commented 3 years ago

I will debug this further later this week. Could one of you please send me the aforementioned reproducer?

bekozi commented 3 years ago

I think the trouble is that the areas are difficult to connect to ESMF_OutputScripWeightFile the way esmpy is calling it. Another solution here is to put a Python wrapper on ESMF_RegridWeightGenFile. It does not necessarily look difficult to wrap, but it does look time consuming. Another option would be to call the CLI RWG to create the weights for each chunk combination and merge them afterwards. What do you think @rokuingh?

slevis-lmwg commented 3 years ago

I will debug this further later this week. Could one of you please send me the aforementioned reproducer?

qsub /glade/work/slevis/ocgis_work/no_subset_20200825_reproducer.sh

slevis-lmwg commented 3 years ago

@rokuingh cc: @bekozi is there an update regarding the aforementioned debugging? This issue blocks the use of ocgis in CTSM's mkmapdata tool.

rokuingh commented 3 years ago

@slevisconsulting Sorry for the long wait, but I do have a good idea of how to proceed with this. I am working on the upcoming ESMF 8.1.0 release right now, but I have just been approved to work on this next. I will plan to have a snapshot for you before the end of the month.

slevis-lmwg commented 3 years ago

@rokuingh thank you for prioritizing this issue, I appreciate your help.

rokuingh commented 3 years ago

@slevisconsulting I have been experimenting with this reproducer on Cheyenne, but I have not yet had a successful run even with a walltime of 1 hour. would you mind running this again on your end to make sure nothing has changed with the machine or environment that could explain the issues I am having? In the meanwhile I will move forward with adding the area variables to the weight files.

slevis-lmwg commented 3 years ago

@rokuingh I have not run this script in a while (likely since Oct 2020). Thank you for the heads-up about it failing. I will look into it soon.

Meanwhile, thank you for moving fwd with adding the area variables to the weight files.

rokuingh commented 3 years ago

@slevisconsulting I've added the ability to write areas to the weight files generated by ESMPy using FileMode.WITHAUX. It is currently available on the develop branch of ESMF, but I could create a tag if that is more easily accessible, or something else? Also, I just realized that you will probably also need fractions since you are using conservative regridding. Please let me know if that is the case.

rsdunlapiv commented 3 years ago

@slevisconsulting have you been able to test the new weights files from ESMPy with areas added? @rokuingh

slevis-lmwg commented 3 years ago

@rsdunlapiv @rokuingh thank you for checking in, and I apologize for not communicating since 4/23.

I'm afraid I haven't tested this. I had hoped to get the CTSM surface-data tool-chain fully working with ocgis while @bekozi was available. At this point I have set that work aside until I hear otherwise from @ekluzek @billsacks @dlawrenncar .

billsacks commented 1 year ago

We no longer plan to use OCGIS in the CTSM surface data tool chain so, from the CTSM perspective, it's fine for this issue to be closed.