Open ajelenak opened 4 years ago
Prima facie this makes sense to me but of course these would somehow need to be linked to the container variable so that it's clear how to bring the subsampled coordinates into full resolution.
It is simpler to just keep using the coordinates
attribute. The presence of an interpolation container attribute could serve as a hint that some of the variables listed in coordinates
might be subsampled domains. Those variables will have a new attribute (subsample_dimension
, interpolation_dimension
; name TBD) that declares domain axis their subsampling data applies to. We need to verify whether this approach would be acceptable for the CF.
I agree that reusing the coordinates
attribute is probably the right approach (if it is acceptable for the CF), otherwise it would become difficult to handle variables with a mix of full and subsampled coordinates.
Adding a subsampled_dimension
attribute on subsampled coordinate variables to indicate the dimensions they should expand to would be in line with the issue regarding reusability of interpolation containers https://github.com/erget/subsampled-coordinates/issues/5
Just to complete with information discussed a few minutes ago:
As explained by @davidhassell during the meeting, reusing coordinates
would break backwards compatibility for software that only support older versions of the CF convention.
So in order to keep this compatibility and still be able to define subsampled variables as coordinates, maybe a solution would be to have both:
subsampled_coordinates
coordinates
attribute containing only full coordinate variables so older software would still be able to read the data variables without error but would only be able to locate them with coordinates provided on full dimensions.For example (simplistic, there would obviously be better ways to describe this kind of data):
dimensions :
time = UNLIMITED;
lat = 720;
lon = 1440;
sub_lat = 10;
sub_lon = 20;
aux = 15;
variables :
float grid_data(time, lat, lon, aux);
grid_data : coordinates = "time"
grid_data : subsampled_coordinates = "time lat lon";
float lat(sub_lat);
lat : standard_name = "latitude";
lat : units = "degrees_north";
lat : interpolation = "interpolation_doesnotmatter";
float lon(sub_lon);
lon : standard_name = "longitude";
lon : units = "degrees_east";
lon : interpolation = "interpolation_doesnotmatter";
double time(time);
time : standard_name = "time";
time : units = "<units> since <datetime string>";
time : calendar = "gregorian";
char interpolation_doesnotmatter;
interpolation : description = "not the subject of this issue"
Software that do not support new CF versions would:
grid_data
variable without error as the CDL remains valid for previous versions of the CF conventioncoordinates
standard attributecoordinates
attribute (so, exclusively non-subsampled variables) time
axis (but not on lat
or lon
)Software that implement new CF conventions would:
grid_data
variable without errorcoordinates
and a subsampled_coordinates
standard attributessubsampled_coordinates
attribute since it provides at least as much information as the coordinates
attribute, but potentially moresubsampled_coordinates
, whether they are subsampled or not,time
, lat
and lon
axes.This method adds an overhead (one additional attribute for each data variable linked to a set of coordinates) that could disappear once/if backward compatibility is discarded in later versions of the CF convention.
Hi, I think that there will be resistance to referencing the interpolation container from the subsampled coordinates, rather than from the data variable. I think that this is preferable:
dimensions :
time = UNLIMITED;
lat = 720;
lon = 1440;
sub_lat = 10;
sub_lon = 20;
aux = 15;
variables :
float grid_data(time, lat, lon, aux);
grid_data : coordinates = "time"
grid_data : subsampled_coordinates = "lat lon";
grid_data : interpolation = "interpolation_doesnotmatter";
float lat(sub_lat);
lat : standard_name = "latitude";
lat : units = "degrees_north";
float lon(sub_lon);
lon : standard_name = "longitude";
lon : units = "degrees_east";
double time(time);
time : standard_name = "time";
time : units = "<units> since <datetime string>";
time : calendar = "gregorian";
int tie_points_lon(sub_lon) ;
tie_points_lon:interpolation_dimension = "lon" ;
int tie_points_lat(sub_lat) ;
tie_points_lat:interpolation_dimension = "lat";
char interpolation_doesnotmatter;
interpolation : description = "not the subject of this issue"
The reasons for this are that
lon
is only a subsampled coordinate in the context of the data variable;lon
can not apply the interplation independently because it doesn't know about the the other, linked subsampled coordinates, lat
in this case.Numbering my comments so it is easier to reply:
lon
and lat
are independant in your example, otherwise their dimensions would be (sub_lat, sub_lon)
Supposing lat
and lon
depend on each other and we want to keep the interpolation container variable as reusable as possible (like a function), then we need an attribute materializing this dependency (function arguments). For example:
float lat(sub_lat, sub_lon);
lat : standard_name = "latitude";
lat : units = "degrees_north";
lat : interpolation = "interp_bilinear_container";
lat : interpolation_terms = "v1 : lat v2 : lon"
float lon(sub_lat, sub_lon); lon : standard_name = "longitude"; lon : units = "degrees_east"; lon : interpolation = "interp_bilinear_container"; lon : interpolation_terms = "v1 : lat v2 : lon"
char interp_bilinear_container; interp_bilinear_container : standard_name = "biliinear"
3. Let's say the `time` variable is also subsampled, it does not depend on `lat` or `lon` so there are two independant interpolations to perform (one for `lat`/`lon` and one for `time`).
My understanding is that with your approach it would either mean that:
- the `interpolation` attribute accepts several values but in that case you need to define which coordinate variables are targeted by each interpolation method (keeping in mind that the container variable cannot refer to other variables in order to remain generic/reusable), so you need more interpolation-related attributes on each data variables using these coordinates.
- or there is a single value for in the `interpolation` attribute but in that case the method described in the container variable has to handle the interpolation of all subsampled coordiantes (i.e. `'time`, `lat` and `'lon`) altogether, which adds complexity in the definition of the interpolation container variable.
4. I understand that mimicking the behavior of existing constructs can faciliate acceptance of the proposal, but for me It makes much more sense to keep the interpolation-related attributes in the subsampled coordinate variables: they are the ones we "compressed" and need to be reconstructed, not the data variables that reference the coordinate variables.
Let's say the time variable is also subsampled, it does not depend on lat or lon so there are two independant interpolations to perform (one for lat/lon and one for time).
Good point!
This is easily dealt with in the same way that different coordinate variables can have different grid_mappings (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#grid-mappings-and-projections):
dimensions :
time = UNLIMITED;
lat = 720;
lon = 1440;
sub_lat = 10;
sub_lon = 20;
aux = 15;
variables :
float grid_data(time, lat, lon, aux);
grid_data : coordinates = "time"
grid_data : subsampled_coordinates = "lat lon time";
grid_data : interpolation = "interpolation_XY: lat lon interpolation_T: time";
float lat(sub_lat);
lat : standard_name = "latitude";
lat : units = "degrees_north";
float lon(sub_lon);
lon : standard_name = "longitude";
lon : units = "degrees_east";
double time(time);
time : standard_name = "time";
time : units = "<units> since <datetime string>";
time : calendar = "gregorian";
int tie_points_lon(sub_lon) ;
tie_points_lon:interpolation_dimension = "lon" ;
int tie_points_lat(sub_lat) ;
tie_points_lat:interpolation_dimension = "lat";
char interpolation_XY;
interpolation : description = "not the subject of this issue"
char interpolation_T;
interpolation : description = "not the subject of this issue"
Often the lat/lon or scan/track interpolation has to be done first and time and viewing angles depends on this first interpolation. So, for efficiency, it would be good to have also a way to bundle these in a single interpolation container.
I would support adding references to the index mapping variables in the data variable to improve re-usability of the interpolation container.
Here is a VIIRS example with two different data variable resolutions, M-Band at 750m and I-Band at 375m. I left out all the viewing angles and interpolation coefficients for clarity:
dimensions :
// VIIRS M-Band
m_track = 768 ;
m_scan = 3200 ;
m_channel = 16 ;
// VIIRS I-Band
i_track = 1536 ;
i_scan = 6400 ;
i_channel = 5 ;
// Tie points
tp_track = 96 ;
tp_scan = 205 ;
// Time, stored at scan-start and scan-end of each scan
time_scan = 2;
variables:
// VIIRS M-Band
float m_radiance(m_track, m_scan, m_channel) ;
m_radiance : interpolation = "interpolation_all" ;
m_radiance : interpolation_indices = "m_track_indices m_scan_indices" ;
int m_track_indices(tp_track) ;
m_track_indices:interpolation_dimension = "m_track" ;
int m_scan_indices(tp_scan) ;
m_scan_indices:interpolation_dimension = "m_scan" ;
// VIIRS I-Band
float i_radiance(i_track, i_scan, i_channel) ;
i_radiance : interpolation = "interpolation_all" ;
i_radiance : interpolation_indices = "i_track_indices i_scan_indices" ;
int i_track_indices(tp_track) ;
i_track_indices:interpolation_dimension = "i_track" ;
int i_scan_indices(tp_scan) ;
i_scan_indices:interpolation_dimension = "i_scan" ;
// Reusable interpolation container, shared by VIIRS M-Band and I-Band
char interpolation ;
interpolation:tie_point_interpolation_name = "bi_quadratic_method1" ;
interpolation:location_tie_points = "lat lon" ;
interpolation:time_interpolation_name = "bi_linear" ;
interpolation:time = "t" ;
// Tie points
float lat(tp_track, tp_scan) ;
lat:standard_name = "latitude" ;
lat:units = "degrees_north" ;
float lon(tp_track, tp_scan) ;
lon:standard_name = "longitude" ;
lon:units = "degrees_east" ;
double t(tp_track, scan_time) ;
t:long_name = "time" ;
t:units = "days since 1990-1-1 0:0:0" ;
I think that either interpolation_indices
attribute (new name: _compactindices?) or something like compacted_coordinates = "lat lon time"
(new name!) will convey the same information.
The m_radiance
data variable depends on the m_track
and m_scan
dimensions which are mentioned in the interpolation_dimension
(new name: compacted_dimension
?) attributes of the m_track_indices
and m_scan_indices
variables. They share the same tp_track
and tp_scan
dimensions as the lat
, lon
, and t
compacted coordinates.
I would agree that compacted_dimension
is nice and descriptive.
However, by comparison to the grid mapping terminology
grid_mapping = ...
grid_mapping_name = ...
I think I prefer that all names start with interpolation_
for the attributes in the data set:
interpolation = ....
interpolation_indices = ...
and the attributes of the indices variable:
interpolation_dimension = ....
This make them appear as part of the same concept, which they are.
What do you think?
The purpose of this issue is to determine what new attributes for data variables are needed. These variables in CF hold scientific data discretized within a domain and are represented by the Field construct in the CF data model.
There seems to be enough agreement for a new attribute, similar to the
grid_mapping
attribute. One proposed name for it istie_point_interpolation
. Its value is the name of a container variable which describes the interpolation method for computing coordinate data at the same domain resolution as the field construct to which this attribute is assigned.Are any additional new attributes needed?
Whenever a field construct depends on multidimensional (rank > 1) coordinates, or a dimension (rank = 1) coordinate is named differently than its dimension, such variables must be listed in a
coordinates
attribute. This means that every subsampled coordinate will have to be included in this attribute. Or a new one with the same role as thecoordinates
attribute.The following short example illustrates using just the
coordinates
attribute assigned to theswath_data
variable (a field construct):If using a new attribute, the above would become:
The new attribute, here named
subsampled_coordinates
, is to be used only for subsampled coordinates that otherwise qualify for inclusion in thecoordinates
attribute. One reason for the new attribute is because neither of thetime
,lat
, orlon
coordinates depend on any of theswath_data
's dimensions. So far in CF, variables listed in thecoordinates
attribute always shared at least one common dimension.