erget / subsampled-coordinates

Repository for storing CDL demonstrating subsampled coordinates in CF-netCDF
Apache License 2.0
0 stars 3 forks source link

What subsampled type are netCDF variables with sensor/scan angles? #2

Closed ajelenak closed 4 years ago

ajelenak commented 4 years ago

The VIIRS M band example has solar and sensor angle variables in the tie_points group. These variables seem to represent a new type of subsampled data because they are not subsampled coordinates but are only given at the tie points geolocation. Why sensor, solar, and any other viewing geometry-related data cannot be considered the same as radiance in that example file?

cc @AndersMS

AndersMS commented 4 years ago

In the original VIIRS data the viewing geometry data is provided at the same resolution/dimensions as the radiance data and the latitude/longitude. The overall target is to minimize the product file size, that is the argument for compacting both the latitude/longitude and the viewing geometry data. Location and viewing geometry is closely related in that the main reason for the viewing angles changing across the swath is that you are changing location. Indeed, for that reason, it is efficient to compress and expand location and viewing geometry as part of the same algorithm.

TomLav commented 4 years ago

I see the point of having sub-sampled view/sun geometry interpolated across the swath. But I also do not think they can be added to the interpolation group without some work in the root group to indicate that they exist and be explicit about what they are.

AndersMS commented 4 years ago

The idea was that tie_point_interpolation in the main dataset points to the interpolation container variable:

variables:
      float radiance(track, scan, channel) ;
            radiance:tie_point_interpolation = "tie_points/interpolation" ;

The container variable in turn points to each of the variables for location, sensor direction (if present) and solar direction (if present):

variables:
      int interpolation ;
            ....
            interpolation:location_tie_points = "latitude longitude" 
            interpolation:sensor_direction_tie_points = "sensor_azimuth_angle sensor_zenith_angle" 
            interpolation:solar_direction_tie_points = "solar_azimuth_angle solar_zenith_angle" ;

The VIIRS Day-Night-Band even requires a lunar direction.

We would need to explain this structure in documentation. As solar_azimuth_angle, solar_zenith_angle, sensor_azimuth_angle and sensor_zenith_angle are standard names, they have a well defined meaning.

oceandatalab commented 4 years ago

(Sylvain speaking)

I may be missing something but since radiance is already at full resolution, does it really need an interpolation-related attribute?

I agree with @ajelenak and @TomLav regarding the declaration of view/sun geometry variables because I don't think there is a concept of derived/virtual variable in the convention, so it would add another layer of complexity to our proposal.

Can we simply keep subsampled variables in the global scope and indicate the interpolation information (container variable + index variables + [optional] auxiliary variables) required to reconstruct the full resolution using an attribute?

For example, if we say the attribute is called "interpolation":

Global scope:

dimensions:
      track
      scan
      channel
      subsampled_track
      subsampled_scan

variables:
      double time(track, scan);
      float radiance(track, scan, channel) ;
      float lon(subsampled_track, subsampled_scan);
            lon: interpolation = "somegroupename/method1"
      float lat(subsampled_track, subsampled_scan);
            lat: interpolation = "somegroupname/method1"
      float solar_azimuth_angle(subsampled_track, subsampled_scan);
            solar_azimuth: interpolation = "somegroupname/method1"
      ...
      float other_subsampled_var(subsampled_track, subsampled_scan);
            other_subsampled_var: interpolation = "somegroupname/method2"

Then we have the "somegroupname" group which only contains information required to interpolate from subsampled to full resolution:

variables:
      int tiepoint_track_index(subsampled_track, subsampled_scan);
      int tiepoint_scan_index(subsampled_track, subsampled_scan);
      char method1;
            method1: standard_name = "some_method_name"
            method1: formula_terms = "subtrack: somegroupname/tiepoint_track_index subscan: somegroupname/tiepoint_scan_index"
            method1: fullres_dimensions = "track scan"

      float aux_lookup_table(subsampled_track, subsampled_scan);
      char method2;
            method2: standard_name = "other_method_name"
            method2: formula_terms = "auxiliary: time lut: somegroupename/aux_lookup_table subtrack: somegroupname/tiepoint_track_index subscan: somegroupname/tiepoint_scan_index"
            method2: fullres_dimensions = "track scan"

This way all the main variables are clearly declared in the global scope with minor overhead (one attribute) for subsampled variables and the only new mechanism we introduce is the description of the transformation from subsampled to fullres dimensions using container variables.

ajelenak commented 4 years ago

It seems we have identified a new use case: parameters (variables) that are not coordinates but are given at a subsampled coordinate (tie point) resolution and must be interpolated to a higher coordinate resolution of a data variable. I would delay considering this until we complete the work on subsampled/interpolated coordinates.

AndersMS commented 4 years ago

In the viirs_compact.cdl I modified the variables names that were identical to the corresponding standard_names, to make it clearer what is what.

@Sylvain: Referencing the interpolationcontainer variable from the radiance variable is equivalent to having the grid_mapping reference in the main data set. I agree you could argue that both of them are properties of the coordinates and not of the data set, but by convention the grid_mapping is referenced from the data set. So it would not be wrong to do it with the interpolation as well.

In the example viirs_compact_interpolation_in_root_group.cdl, the interpolation container variable has ben moved out to the root group. Note that all the variable names referenced from the container variable now have the relative path of the tie-point group (tp/). Possibly it could work the way that if the interpolation container variable is in the root group, then it it is implicitly linked to the data sets matching its expanded dimensions. If it is not, it would need to be referenced via an attribute o the data set. What do you think?

If you have the lat and lon on tie-point dimension in the root group, then the question is what to call the lat and lon variables expanded onto full dimensions. The convenience of having the lat and lon on tie-point dimensions inside the tie-point group would be that you could then expand lat and lon directly to the root group without a name conflict. Every thing in the root group would be on full dimensions, everything in the tie-point group on tie-point dimensions.

erget commented 4 years ago

For what it's worth I am strongly in favour of explicitly linking interconnected variables via attribute references -explicit is better than implicit in my view.

Also I concur with @ajelenak - subsampled view angles are an important feature but they don't necessarily need to be ready by next week, although I would incorporate them into the first draft of the proposal already, as they are likely to be needed in order to make use of many data products.

AndersMS commented 4 years ago

Thank you for all the good contributions, it is a valuable discussion.

I will leave it to those of you who are more experienced with CF-convention work to judge when to present what, but I would like to add a couple of comments in support of the viewing directions.

The viewing angles are part of many level-1 products, including several of the Sentinel, Metop-SG and and JPSS level-1 products. They are needed at full resolution; in some cases they are provided in full resolution and in others they are given at tie-points only.

Imagine you add a range to the zenith and azimuth angle of the sensor, then you have the position of the sensor relative to the cell, showing how closely they are related with latitude/longitude/height. As I read the section 4 of the CF-convention they do qualify as coordinate variables and additionally have standard names as for example sensor_azimuth_angle and sensor_zenith_angle.

I suggest that we think of what we are now introducing into the CF-convention, as a framework for coordinate variables that are well suited for being stored at tie-points and interpolated to full resolution using a well-defined method. Beyond what we discussed already, I can think of range, lunar_azimuth_angle, lunar_zenith_angleand geoid_height_above_reference_ellipsoid, but there are probably more candidates.

The more use cases we validate the framework against now, at least conceptually, the more confident we will be that the framework will be suited for later additions.

oceandatalab commented 4 years ago

@AndersMS

If you have the lat and lon on tie-point dimension in the root group, then the question is what to call the lat and lon variables expanded onto full dimensions.

From my point of view using full-dimensions or subsampled-dimensions+interpolation algorithm are just two methods to store data, the only difference being the memory/computing tradeoff. The storage method should only matter for software that decode the file, not to people who will use its content so it should not have any effect on how you name variables.

@ajelenak

It seems we have identified a new use case: parameters (variables) that are not coordinates but are given at a subsampled coordinate (tie point) resolution and must be interpolated to a higher coordinate resolution of a data variable. I would delay considering this until we complete the work on subsampled/interpolated coordinates.

I think the interpolation part of our work on subsampled/interpolated coordinates will cover this use-case anyway, so I agree with @erget and @AndersMS about mentioning it in the proposal.

ajelenak commented 4 years ago

I, too, am in favor of including viewing angles. I expect that to be easier after we finalize the approach for subsampled coordinates.

TomLav commented 4 years ago

Hei all,

Sorry for not interacting more, but here are two thoughts towards the meeting tomorrow:

(Sylvain speaking) ... I don't think there is a concept of derived/virtual variable in the convention, so it would add another layer of complexity to our proposal.

I suppose one could say that CF>=1.8 introduces "virtual / derived" lat/lon variables in the case of grid_mapping when the 2D lat/lon fields are not included. An application that wants lat/lon has to do all the work, guessing from grid_mapping and the x and y coordinates variables.

Also, I wanted to raise again the example of EPS-SG MWI L1B data products where the subsampling is only performed in the along-scan dimension, while the the along-track direction is not subsampled. So we could imagine fields with a mix of subsampled/full coordinates variables to compute the full lat/lon.

TomLav commented 4 years ago

I suggest that we think of what we are now introducing into the CF-convention, as a framework for coordinate variables that are well suited for being stored at tie-points and interpolated to full resolution using a well-defined method. Beyond what we discussed already, I can think of range, lunar_azimuth_angle, lunar_zenith_angleand geoid_height_above_reference_ellipsoid, but there are probably more candidates.

We can at least add "time" when time represents the sensing time for each FoV. It is a full 2D field, but is very often stored as e.g. 1 time per scanline.

AndersMS commented 4 years ago

@oceandatalab

From my point of view using full-dimensions or subsampled-dimensions+interpolation algorithm are just two methods to store data, the only difference being the memory/computing tradeoff. The storage method should only matter for software that decode the file, not to people who will use its content so it should not have any effect on how you name variables.

I agree that for a people with software that takes care of decoding the file, this will not matter.

But could there be a use case where a user that receives a compact file and in a first step would like to expand that file to a full resolution file, using a dedicated expansion tool? That could be the case if the user would like to use if for several applications or if one of the user application is not (yet) capable of decoding the compact coordinates.

If this expansion would be done inside the original compact NetCDF file, there would be a name conflict if the compact data is in the root group. This internal expansion would be comparable to the HDF5 internal compression scheme, where a file can be compresses/uncompressed internally, without generating a new HDF5 file.

However, if the expansion is done into a new separate NetCDF file, then there would be no name conflict.

Hope my point is clear: Is the use case relevant?

AndersMS commented 4 years ago

@TomLav

We can at least add "time" when time represents the sensing time for each FoV. It is a full 2D field, but is very often stored as e.g. 1 time per scanline.

Good point, thank you for bring that in.

AndersMS commented 4 years ago

@TomLav

Also, I wanted to raise again the example of EPS-SG MWI L1B data products where the subsampling is only performed in the along-scan dimension, while the the along-track direction is not subsampled. So we could imagine fields with a mix of subsampled/full coordinates variables to compute the full lat/lon.

We discussed one way to address that case in one of the past teleconferences. It is illustrated on page 18 in the GDoc, the upper left of the four examples: https://docs.google.com/document/d/1y7ucbfi2GviJv-kAzeBxdHh0NGH4fjoWV0g9Y1dpch8/edit#

Effectively, in this scheme, the tie-points would be included for each scan in the along-track direction.

AndersMS commented 4 years ago

@TomLav

We can at least add "time" when time represents the sensing time for each FoV. It is a full 2D field, but is very often stored as e.g. 1 time per scanline.

I have added a new VIIRS example with two time stamps per scan, one at the beginning of the scan and one at the end, but there are other options including the one you mention.

Example is here: https://github.com/erget/subsampled-coordinates/tree/master/VIIRS_M_Band_Example_with_time

erget commented 4 years ago

As discussed in today's meeting,

Conclusion: In our paper, we will note that we have considered this and decided to treat viewing angles as coordinates. This will be further scrutinised at the upcoming Community Meeting, where problems we may have overlooked should come to light.