Scalar field data is not written to the output file

menzel-gfdl commented 3 years ago

Describe the bug I have a set of history files that include two "scalar" (in a diag_manager sense) variables. When I run fregrid the command completes, but the data for the scalar fields is not written to the output file (solar_constant and earth_sun_distance_fraction). All the other 2d and 3d fields in the file appear to be working correctly.

To Reproduce The files I have been using are here:

/lustre/f2/scratch/gfdl/Raymond.Menzel/work/c96L33_am4p0_2000_2014_1x0m2d_216x1a.o205628311/20000101.new_offline_input.tile*.nc

and the command I ran is:

srun --nodes=1 --ntasks-per-node=6 /ncrc/home2/fms/local/opt/fre-nctools/bronx-19/ncrc/bin/fregrid_parallel --input_mosaic INPUT/C96_mosaic.nc --input_file 20000101.new_offline_input --nlon 144 --nlat 90 --scalar_field cosine_zenith,daylight_fraction,earth_sun_distance_fraction,infrared_diffuse_albedo,infrared_direct_albedo,land_fraction,layer_pressure,layer_temperature,layer_thickness,level_pressure,level_temperature,ozone,shallow_cloud_fraction,shallow_droplet_number,shallow_ice_content,shallow_liquid_content,shallow_size_drop,solar_constant,strat_size_drop,stratiform_cloud_fraction,stratiform_droplet_number,stratiform_ice_content,stratiform_liquid_content,surface_temperature,visible_diffuse_albedo,visible_direct_albedo,water_vapor

Expected behavior The scalar data to be copied to the output file.

System Environment I have tried this using fre/bronx-19 on the gaea log-in nodes.

ceblanton commented 3 years ago

The odd thing is that the scalar fields appear to be in the output files (netcdf headers), but they are filled with missing values.

@ngs333 these "scalar" through fregrid was added for CMIP6 I believe, but in this case does not seem to work. Passing bk and pk (atmosphere grid pressure info, 1D vertical) through to the regridded files was the motivation, so those at least should work properly

menzel-gfdl commented 3 years ago

@ceblanton @ngs333, I think this case should be added to the fregrid unit tests too.

ngs333 commented 3 years ago

@menzel-gfdl , @ceblanton It appears that if a scalar fields composite index does not have the two spatial dimensions, then its just skipped. The spatial dimensions are grid_xt and grid_yt. As examples, From the input tile nc files, a field that has a spatial dimension is the three dimensional field: float land_fraction(time, grid_yt, grid_xt) and one that does not is this two dimensional field: float earth_sun_distance_fraction(time, scalar_axis) ;

In the fregrid.c code, approximately after line 1000, there is a section for re-gridding scalar variables which starts like this:

       if( !scalar_in->var[l].has_taxis && m>0) continue;
       if( !scalar_in->var[l].do_regrid ) continue;

So if do_regrid is not set to true for that field, the field is not re-gridded and actually it is totally skipped as far as determining what its values should be.

The filed do_regrid is initialized to zero, and it is only set to one in file get_input_metadata like this: if(xcart == 'X' && ycart == 'Y') field[n].var[ll].do_regrid = 1; i.e. when it has both spatial axes.

For these fields( (like earth_sun_distance_fraction) that are not really on the grid, what values should be written into the output file? It seems that though the values of the fields are present in each of the six input tile files, they are the same values in each tile file. If this is really supposed to be the case in all cases, then we can just copy the values from one of the input tile files into the output file. Hopefully someone can verify! :-)

I am also wondering if this gridless scalar data is a new data type as the code has been like this for a long time.

Finally, not that its related to this issue, but some of the input tile data is missing. See the field strat_size_drop in the input tile files.

ceblanton commented 3 years ago

Hi @menzel-gfdl and @ngs333, I suggest that this use case (passing non-regriddable variables unchanged through fregrid) not be supported, as it's essentially a ncks-type of functionality that is already available through other tools (ncks, split_ncvars.pl).

Ray, I think re-arranging your diag_table to have one history file per grid type is the right solution, so that then these scalar fields would not be passed through fregrid. Would this workaround solution work for you?

I think that fregrid should detect this situation better though. Creating an output file with the non-regriddable scalar variable with missing values naturally leads to confusion. fregrid has some logic to somewhat gracefully detect non-regriddable fields that are requested to be regridded in the input variable list (--scalar_field). These non-regriddable fields have the variable attribute interp_method=NONE, which fregrid looks for. Miguel, do you think that these other nonregriddable fields (earth_sun_distance_fraction) could be handled similarly to the fields flagged with interp_method=NONE variable attribute?

So maybe if fregrid sees this type of non-regriddable variable (variable whose dimensions aren't regriddable) it should

Print a warning that this variable has no regriddable dimenions and will be ignored
Don't include it in the output file
If there are no variables to be regridded, don't write an output file and exit 0 (as is the case for the interp_method=NONE variables)

menzel-gfdl commented 3 years ago

@ceblanton I would prefer if the variables were just copied, but if it is too difficult to implement I can change my workflow. Basically I am trying to get the models to produce a single history file that can then be read by an offline model. It would be nice to have all the variables in the same file instead of having to deal with two separate files. I guess the most important thing would be some clear, centralized documentation, as I had no idea that mixing gridded and non-gridded variables would even be a problem to begin with.

ngs333 commented 3 years ago

@ceblanton , @menzel-gfdl Its easy enough to change the code to either write a warning or to copy the scalars. We just need to decide what is (are) the best option(s) for all.

ceblanton commented 3 years ago

Hello, we discussed this during last Thursday's meeting and the consensus was to not add new fregrid features and instead print a suitable warning and not output empty fields as it does today.

fregrid has two options for specifying input datasets: scalar fields (--scalar_field var1,var2 and vector fields --u_field and v_field). I think it could be potentially confusing to include both non-regriddable 1D scalars and the more usual 2/3D scalars both in the --scalar_field option.

Out of curiosity, I checked what the behavior was for two other main NetCDF regridders (Climate Data Operators CDO and NCO remap) and they both copied over the 1D solar_constant variable by default. However, the regular usage of both of those regridders is to not specify a variable list, but regrid all, whereas the fregrid usage is to always specify a list of variables to regrid.

Sorry for this inconvenience to processing workflow @menzel-gfdl. Could you include a ncks call after the fregrid to append the extra fields? (I realize it's not as convenient!)


ncks -A -v solar_constant,earth_sun_distance_fraction 20000101.new_offline_input.tile1.nc 20000101.new_offline_input.144x90.nc

NOAA-GFDL / FRE-NCtools

Scalar field data is not written to the output file #114