erget / subsampled-coordinates

Repository for storing CDL demonstrating subsampled coordinates in CF-netCDF
Apache License 2.0
0 stars 3 forks source link

Choose base term for naming new attributes #8

Open ajelenak opened 4 years ago

ajelenak commented 4 years ago

There are at least two different naming schemes for new attributes currently in use. Since the number of these new attributes seems to be stabilizing, now would be a good time to pick a base term from which to derive new attribute names.

These two terms have been used so far: subsample and interpolate. Additional options: reduce, restore.

A related issue is the term tie point. It is somewhat specific to remote sensing. Perhaps anchor is a more generic term but equally applicable?

Other naming suggestions are welcome!

erget commented 4 years ago

Here's my opinions on terms:

davidhassell commented 4 years ago

"Subsampled coordinate" implies for me that we're "thinning" coordinates, which can be the case but is not always the case, as we are sometimes interpolating pixel centres from provided corner coordinates.

I get the message here - that a "subsampled" coordinate does not need to correspond to exact location of one of the full (post-interpolation) coordinates. I don't follow the example, though: I not sure what "corner coordinates" are. Is the conical scanner case an example of this?

Thanks!

davidhassell commented 4 years ago

Restored vs. full vs. complete vs. upsampled: I advocate using the term "full coordinates" to discuss the coordinates that result from interpolating as we advise the user, for much the same reason that I advocate using the term "compaction" to describe the reduction of the number of coordinates we provide.

Perhaps the result from interpolating should be "un" the name of the provided values, which is analogous to (un)packed and (un)compressed - terms which are already in use. I.e. if we were to call the values in the file "compacted", then result of interpolation could be "uncompacted".

erget commented 4 years ago

The specific example I had in mind was the VIIRS case - there the tiepoints correspond to the corners of interpolation groups. This example demonstrates a lot of the advantages: image

A, B, C, and D are the corners of the interpolation zone. The grid spanned between them is split into individual pixels, whose centres are dots. By specifying how the tie-points are offset from the closest pixel corners and how to bend the lines AB and CD, AD and BC respectively, you can reconstruct all of those dots from A, B, C, D and the parameterization.

Thus A, B, C, and D don't belong to the original, "full" set of coordinates.

By specifying different offsets of course we could use tie-points whose positions are collocated with the centres of their corresponding pixels. Using the corners is nice because it allows us to store one set of tie-points that can be applied to observations from multiple instruments that observe on similar grids but at different resolutions (in the case of VIIRS these are the M- and I-bands).

erget commented 4 years ago

From @davidhassell:

Perhaps the result from interpolating should be "un" the name of the provided values, which is analogous to (un)packed and (un)compressed - terms which are already in use. I.e. if we were to call the values in the file "compacted", then result of interpolation could be "uncompacted".

Lots of wisdom there, I've edited my comment to say

  • Uncompacted vs. restored vs. full vs. complete vs. upsampled: Originally, I advocated using the term "full coordinates" to discuss the coordinates that result from interpolating as we advise the user, for much the same reason that I advocate using the term "compaction" to describe the reduction of the number of coordinates we provide. However, @davidhassell notes that "un"-compacting coordinates would correspond with current vocabulary. The term feels a bit clunky in the mouth but I prefer precision to aesthetics.
erget commented 4 years ago

We're considering this agreed and adopting this into the proposal's terms.

davidhassell commented 4 years ago

@erget - could you re-e-mail the zoom link? Thanks!

ajelenak commented 4 years ago

Wasn't clear for me towards the end of today's meeting, should this issue be reopened to discuss the specific attribute names using the adopted "compact" base term or should there be separate issue?

erget commented 4 years ago

I had understood this issue to refer to general terminology issues - are you referring to namespacing or a similar concern?

ajelenak commented 4 years ago

I am thinking of the next step, which would be to apply the adopted base term "compact" to actual new attribute names. Is this what you call "namespacing"?

erget commented 4 years ago

I think so. Let's see with some examples:

Is this what you mean?

AndersMS commented 4 years ago

I was thinking of the "compaction" more as a description of the overall process, not necessarily as a word that would appear in variable or attribute names.

See also my comment here: https://github.com/erget/subsampled-coordinates/issues/6#issuecomment-637571722

ajelenak commented 4 years ago

No, this issue was not for just the term describing the overall process but for the term to apply in new attribute names.

erget commented 4 years ago

OK. I'd be fine with using interpolation_ as a prefix, if you think it makes sense in light of our describing the overarching process as "compaction".

AndersMS commented 4 years ago

@ajelenak I agree the issue is about the actual attribute names, but we will also have some terms describing the overall process, without these becoming part of actual attribute names.

I think we converged earlier on using interpolation as the equivalent of grid_mapping

Probably the value of interpolation is that it describes what a user has to do to a compact product to get an uncompacted product. If we replaced interpolation altogether with compaction or compact in the attribute names, they would be less descriptive for a user that receives the product. The compaction has already taken place, generating the compact file.

Don't know if that makes sense.

erget commented 4 years ago

Another thought occurs to me - strictly speaking we're not always interpolating.

In the HDF-EOS case you might have a full set of coordinates that extends beyond the "corners" of the "compacted" coordinates, as hinted at here: image

The same applies to the cases for microwave imagers where a single set of coordinates is provided, and these coordinates are used in order to extrapolate the coordinates of the other channels, as here: image

Therefore I think that we'll need to think about this some more. I don't want to change the terminology when we're converging on the final blueprint for next week's presentation but it's an open issue in my opinion, as interpolation is simply the more common case here, but we want to accommodate extrapolation as well.

davidhassell commented 4 years ago

Off the top of my head, this smells of "regridding"

AndersMS commented 4 years ago

What about "tie_point_gridding" or just "gridding"?

Gridding is a good word as it would cover both the processes of compacting and uncompacting as well as interpolation and extrapolation.

Tie points are a key element of the method and would make it more accurate and descriptive. Gridding alone would be conveniently short.

AndersMS commented 4 years ago

The full draft vocabulary currently used in the examples NDVI_lat_lon_Example, NDVI_grid_mapping_Example and VIIRS_M_and_I_Band_Example includes the following attribute names, based on the word interpolation:

In the data variable:
    interpolation
    interpolation_indices
    interpolation_offsets 
In the *_indices variables
    interpolation_dimension
In the container variable
    interpolation_name 
    interpolation_coefficients 
    interpolation_flags 
    location_tie_points 
    sensor_direction_tie_points 
    solar_direction_tie_points
    lunar_direction_tie_points 
    time_tie_points 

If we choose tie_point_gridding the full vocabulary could be:

In the data variable:
    tie_point_gridding
    tie_point_gridding_indices
    tie_point_gridding_offsets 
In the *_indices variables
    tie_point_gridding_dimension
In the container variable
    tie_point_gridding_name 
    tie_point_gridding_coefficients 
    tie_point_gridding_flags 
    location_tie_points 
    sensor_direction_tie_points 
    solar_direction_tie_points 
    lunar_direction_tie_points 
    time_tie_points 

If we choose gridding the full vocabulary could be:

In the data variable:
    gridding
    gridding_indices
    gridding_offsets 
In the *_indices variables
    gridding_dimension
In the container variable
    gridding_name 
    gridding_coefficients 
    gridding_flags 
    location_tie_points 
    sensor_direction_tie_points 
    solar_direction_tie_points 
    lunar_direction_tie_points 
    time_tie_points 
erget commented 4 years ago

I lean toward tie_point_gridding to be absolutely explicit, is that too clunky?