AndersMS commented 3 years ago

Title

Lossy Compression by Coordinate Sampling

Moderator

@JonathanGregory

Moderator Status Review [last updated: YYYY-MM-DD]

Brief comment on current status, update periodically

Requirement Summary

The spatiotemporal, spectral, and thematic resolution of Earth science data are increasing rapidly. This presents a challenge for all types of Earth science data, whether it is derived from models, in-situ, or remote sensing observations.

In particular, when coordinate information varies with time, the domain definition can be many times larger than the (potentially already very large) data which it describes. This is often the case for remote sensing products, such as a swath measurements from a polar orbiting satellite (e.g. slide 4 in https://cfconventions.org/Meetings/2020-workshop/Subsampled-coordinates-in-CF-netCDF.pdf).

Such datasets are often prohibitively expensive to store, and so some form of compression is required. However, native compression, such as is available in the HDF5 library, does not generally provide enough of a saving, due to the nature of the values being compressed (e.g. few missing or repeated values).

An alternative form of compression-by-convention amounts to storing only a small subsample of the coordinate values, alongside an interpolation algorithm that describes how the subsample can be used to generate the original, unsampled set of coordinates. This form of compression has been shown to out-perform native compression by "orders of magnitude" (e.g. slide 6 in https://cfconventions.org/Meetings/2020-workshop/Subsampled-coordinates-in-CF-netCDF.pdf).

Various implementations following this broad methodology are currently in use (see https://github.com/cf-convention/discuss/issues/37#issuecomment-608459133 for examples), however, the steps that are required to reconstitute the full resolution coordinates are not necessarily well defined within a dataset.

This proposal offers a standardized approach covering the complete end-to-end process, including a detailed description of the required steps. At the same time it is a framework where new methods can be added or existing methods can be extended.

Unlike compression by gathering, this form of compression is lossy due to rounding and approximation errors in the required interpolation calculations. However, the loss in accuracy is a function of the degree to which the coordinates are subsampled, and the choice of interpolation algorithm (of which there are configurable standardized and non-standardized options), and so may be determined by the data creator to be within acceptable limits. For example, in one application with cell sizes of approximately 750 metres by 750 metres, interpolation of a stored subsample comprising every 16th value in each dimension was able to recreate the original coordinate values to a mean accuracy of ~1 metre. (Details of this test are available.)

Whilst remote sensing applications are the motivating concern for this proposal, the approach presented has been designed to be fully general, and so can be applied to structured coordinates describing any domain, such as one describing model outputs.

Technical Proposal Summary

See PR #326 for details. In summary:

The approach and encoding is fully described in the new section 8.3 "Lossy Compression by Coordinate Sampling" to Chapter 8: Reduction of Dataset Size.

A new appendix J describes the standardized interpolation algorithms, and includes guidance for data creators.

Appendix A has been updated for a new data and domain variable attribute.

The conformance document has new checks for all of the new content.

The new "interpolation variable" has been included in the Terminology in Chapter 1.

The list of examples in toc-extra.adoc has been updated for the new examples in section 8.3.

Benefits

Anyone may benefit who has prohibitively large domain descriptions for which absolute accuracy of cell locations is not an issue.

Status Quo

The storage of large, structure domain descriptions is either prohibitively expensive, or is handled non-standardized ways

Associated pull request

PR #326

Detailed Proposal

PR #326

Authors

This proposal has been put together by (in alphabetic order)

Aleksandar Jelenak Anders Meier Soerensen Daniel Lee David Hassell Lucile Gaultier Sylvain Herlédan Thomas Lavergne

AndersMS commented 3 years ago

Dear All,

Regarding the wording of the section on computational precision attribute, I have reservations with respect to the direction it has taken and I suggest we discuss the matter during our meting this afternoon.

It is essential to the value and usability of the of the Lossy Compression by Coordinate Sampling to reach a common understanding on this and get the wording of the new section right.

Here are a couple of thoughts and comments for the further discussion.

@oceandatalab : You wrote:

The "{...] using 64-bit floating-point arithmetic will reconstitute [...]" in the shorter version is misleading from my point of view because it eludes the software/hardware factor (though I agree it will not be an issue in most cases).

The full sentence here was:

As an example, a computational_precision = "64" would provide the guidance to the data user that using 64-bit floating-point arithmetic will reconstitute the coordinates with an accuracy comparable to the accuracy intended by the data creator.

and I think that with the wording an accuracy comparable in the sentence is reasonable as I wrote it.

@davidhassell: You wrote:

The accuracy will also depend, however, on how the interpolation method is implemented

and

There are no restrictions on the choice of interpolation method implementation, for neither the data creator nor the data user,

I am uncertain about the meaning of this. As I see it, most of what we have written is aimed at accurately and completely describe both the process of compressing coordinates and uncompressing coordinates, so what scope do you see for variations in implementations that would have an effect on the numerical results of the uncompression process?

If we look at one of the interpolation methods, say the Biquadratic Interpolation of geographic coordinates method, it will never fully reproduce the coordinates, unless the the original coordinates were located in a perfect biquadratic manner, which is typically not the case. This is true even if we disregard limitations of the floating point arithmetic and differences of the computing platform. One can think of this as the mathematical performance of the method, excluding any affects from floating point arithmetic and differences of the computing platform.

In the VIIRS example we have looked at, this mathematical performance is in the order of 0.5 m when using 16x16 point size of the interpolation subarea for VIIRS M-Band.

If we look at the capability of floating-point numbers to represent a position on a global scale (Earth radius of 6371) then we have that:

32-bit floating-point, having 7.22 significant decimal digits, can represent a global position to a precision of 0.38 m.
64-bit 3 floating-point, having 15.95 significant decimal digits, can represent a global position to a precision of 0.7 10-9 m or 0.7 nano-meter, a distance comparable to the diameter of our largest atoms.

The floating-point arithmetic operation that constitutes the interpolation method will contribute to degrading the precision of the reconstituted coordinates, compared to the precision of the floating point precision itself.

If we say that this degradation of precision is one to two orders of magnitude, then we have that:

Applying 32-bit floating-point arithmetic to the VIIRS example, the error introduced by the floating-point arithmetic operations will be one to two orders of magnitude larger than the pure mathematical performance of the biquadratic method.
Applying 64-bit floating-point arithmetic to the VIIRS example, the error introduced by the floating-point arithmetic operations will be seven to eight orders of magnitude smaller than the pure mathematical performance of the biquadratic method.

As it can be seen, in this example, applying 32-bit floating-point arithmetic may noticeably impact results in a negative way, whereas applying 64-bit floating-point arithmetic will not noticeably impact results, when compared to the overall mathematical performance of the method.

As I understand it, computing platform variations in floating-point arithmetic implementations will mainly have the effect of introducing errors/deviations in the last significant bits of the floating-point numbers. So if the computational precision is chosen to have sufficient margin with respect to the mathematical performance (in the example above, this would be 64-bit floating-point arithmetic) the effect of computing platform variations are unlikely to be noticeable for a well implemented interpolation method.

Cheers Anders

erget commented 3 years ago

We may be solving a problem here before it arises. From this arises the danger that we'll solve a problem that won't arise, or that we'll solve it in a way that's not as useful as it could be!

It seems that computational precision is neither sufficient to describe the actual target, which is the positional error, nor is it necessary when considered in light of the rest of the Conventions, which also do not give such low-level details about numerical reconstruction - although this might be relevant for geophysical variables and the like!

I propose therefore leaving it off. Data producers do have this field: comment : Miscellaneous information about the data or methods used to produce it.

@AndersMS et al. FYI

oceandatalab commented 3 years ago

Given the discussion we had during the meeting, I would also be in favor of encouraging data producers to use the comments attribute of the subsampled coordinates variables to describe the process they used to uncompress the data but also the positional error they got during the validation/commission phase.

Maybe we could append a new item at the end of "Coordinate Compression Steps" in Appendix J recommending that data producers check the positional error by comparing the reconstructed coordinates against the original data, and then provide as many details as possible regarding the reconstruction process and results (computational precision, positional error, etc...) in the comments attribute of the subsampled coordinates variables.

davidhassell commented 3 years ago

Maybe we could append a new item at the end of "Coordinate Compression Steps" in Appendix J recommending that data producers check the positional error by comparing the reconstructed coordinates against the original data, and then provide as many details as possible regarding the reconstruction process and results (computational precision, positional error, etc...) in the comment attribute of the subsampled coordinates variables.

Thanks, Sylvain, I support this suggestion

AndersMS commented 3 years ago

I too like the idea to recommend that data producers report positional errors (and I guess other coordinate value errors) between the original data coordinates and the reconstituted coordinates in the comment attribute of the subsampled coordinate variables.

Regarding the specification of the computational precision, which is required as input for the method to achieve an accuracy within the errors reported in the comment of the coordinate variable, my preference would still be the computational_precision attribute of the interpolation variable. I believe that for our Lossy Compression by Coordinate Subsampling to become popular, it should be easy and straight forward to use, in particular the data uncompression process by the data user. There should be no need for the user to look into the data variable comments, in order to be able to uncompress the data set. The computational_precision attribute makes it readable by the software in a safe and automated manner.

The reason for this preference is that I have tried out different selections of interpolation method, degree of subsampling (4x4, 8x8, 16x16, 64x16) and computational precision (64-bit, 32-bit floating-point arithmetic) on a test data set. All three components can have a comparable effect on the positional error between the original and the uncompressed file, which I think justifies specifying the computational precision in the same way as we specify the interpolation method and the degree of subsampling.

@erget: It is true that the Conventions do not address computational precision, but I guess there are a number of undocumented and implicit assumptions. Say, if you have specified a grid mapping for coordinates represented as 64-bit floating-point, one would assumes that the conversion between the two reference frames have been performed using 64-bit floating-point arithmetic, otherwise significant errors would be introduced. Considering the complexity of what we are doing, I think that stating the computational precision explicitly would be the safest.

Best regards, Anders

AndersMS commented 3 years ago

Dear team,

Following our meeting this afternoon, I propose the following new paragraph at the end of the section "Tie Points and Interpolation Subareas":

Tie point coordinate variables for both coordinate and auxiliary coordinate variables must be defined as numeric data types and are not allowed to have missing values.

Please let me know if you have comments.

Anders

Done: 0c5b732

AndersMS commented 3 years ago

Dear @JonathanGregory,

Just an update regarding the Lossy Compression by Coordinate Subsampling.

We have completed the implementation of the 16 changes in response to your comments on chapter 8. I have edited the comment above to include a link to the related commit(s) for each of the changes.

Generally we are very happy with the outcome and in particular the renaming of terms and attributes that you proposed has made the text easier to read.

You might wish to take a look at the rewritten section "Interpolation of Cell Boundaries". In response to your proposed change 15, we have had several discussions and meetings, resulting in a new concept for bounds interpolation. You will find the new section as the last in f3de508.

We will still do one more iteration on the section on Computational Precision, we will publish it here within the next days.

Regarding the Appendix J, we have nearly completed the changes required to reflect the changes in Chapter 8. We expect to complete the update tomorrow or Thursday and I think it would make sense for you to wait for that before reading the appendix J.

Best regards, Anders

JonathanGregory commented 3 years ago

Dear @AndersMS

Thanks for the update and your hard work on this. I will read the section again in conjunction with Appendix J, once you announce that the latter is ready.

Best wishes

Jonathan

AndersMS commented 3 years ago

Dear All,

Just to let you know that as agreed during the discussion of the new "Interpolation of Cell Boundaries" section (f3de508) I have added a the following sentence in the "Interpolation Parameters" section (2ce5d66):

The interpolation parameters are not permitted to contain absolute coordinate information, such as additional tie points, but may contain relative coordinate information, for example an offset with respect to a tie point or with respect to a combination of tie points. This is to ensure that interpolation methods are equally applicable to both coordinate and bounds interpolation.

Anders

AndersMS commented 3 years ago

Dear @JonathanGregory,

Appendix J is now ready for your review.

The only remaining open issues is now that we will do one more iteration on the section on Computational Precision for Chapter 8 - we will publish it here within the next days.

Best regards, Anders

JonathanGregory commented 3 years ago

Dear @AndersMS et al.

Thanks for the new version. Can you tell me where to find versions of Ch 8 and App J with the figures in place? That would make it easier to follow.

I've just read the text of Ch 8, which I found much clearer than before. I don't recall reading about bounds last time. Is that new, or was I asleep?

Best wishes

Jonathan

AndersMS commented 3 years ago

Dear @JonathanGregory ,

I am still a bit new to documents on GtiHup, but these two links does the job in my browser:

I got these links by going to #326, then selecting the Files changed tab, then scrolling down to ch08.adoc or appj.adoc and then selecting View File in the "... " pull down menu on the right hand side, opposite to the file name.

Hope this will work at your end.

We had a section on boundary interpolation in the first version you read, but it was short and didn`t do the job we would like it to do. For example, it did not guarantee to reconstitute contiguous bounds as contiguous bounds. The new section is our consolidated version, which does all what we wanted it to do.

Best regards, Anders

JonathanGregory commented 3 years ago

Great, thanks, @AndersMS. I am still learning about GitHub. I was using the Diff, which doesn't show the diagrams, rather than Viewing the file, which works fine. Jonathan

JonathanGregory commented 3 years ago

Dear @AndersMS and colleagues

Thanks again for the new version. I find it very clear and comprehensive. I have a few comments.

Chapter 8

"Tie point mapping attribute" mentions "target dimension", which is not a phrase used elsewhere. Should this be "interpolated dimension"?

You say, "For the purpose of bounds interpolation, a single bounds tie point is created for each coordinate tie point, and is selected as the vertex of the tie point cell that is the closest to the boundary of the interpolation subarea with respect to each interpolated dimension." I don't understand why there is a choice of bounds tie points, because there's no index variable for them. Doesn't the tie point index variable dictate the choice of tie points for bounds?

Appendix J

The title says Appendix A. Presumably that's something to do with automatic numbering.

All of the subsections listed at the start (Common Definitions and Notation, Common conversions and formulas, Interpolation Methods, Coordinate Compression Steps, Coordinate Uncompression Steps) should have subsection headings, I think. They will be Sections J.1 etc. At the moment the last two are labelled as Tables J.1 and J.2 rather than subsections, but they're never referenced as tables.

Fig 1ff. s is explained beneath the fig, but it would be useful it explain it at the side of the fig as well, as you do for tp i and u. Also, it would be useful to put the paragraph explaining notation before Fig 1, because Fig 1 uses the notation.

You say, "When an interpolation method is referred to as linear or quadratic, it means that the method is linear or quadratic in the indices of the interpolated dimensions." Linear also means that the coordinates of the interpolated points are evenly spaced, doesn't it; if so, that would be helpful to state.

You say, "In the case of two dimensional interpolation, the two variables are equivalently computed as ...". I would say "similarly", not "equivalently", which I would understand to mean that s1 and s2 are equivalent.

quadratic. It would be better not to use c for the coefficient, because it can be confused with the point c.

Please put the "Common conversion and formulae" table before the interpolation methods, or at least refer to it. Otherwise the reader encounters fdot fcross fplus fminus fmultiply etc. without having seen their definitions. Actually you list it before the interpolation methods in the preamble.

[bi_]quadratic_remote_sensing. Why not call it [bi_]quadratic_latitude_longitude, which describes the method, rather than its typical application? What does it mean to treat them as Cartesian or not? I would describe bilinear interpolation in lat,lon as treating them as Cartesian coordinates, but you must mean something different. Is there a projection plane involved?

Where is latitude_limit defined?

A couple of times, you write, "For each of the interpolated dimension". There should be an -s.

Conformance

For "Each tie_point_variable token specifies a tie point variable that must exist in the file, and each interpolation_variable token specifies an interpolation variable that must exist in the file," I think all you can say is that there are variables of these names in the file, since a checker can't tell they are definitely the "kind" of variable you intend.

Regarding, "The legal values for the interpolation_name attribute are contained in Appendix J," it would be helpful for the author of the checker to say where they can be found in the appendix.

Best wishes

Jonathan

JonathanGregory commented 3 years ago

Dear all

@AndersMS and colleagues have proposed a large addition to Chapter 8 and an accompanying new appendix to the CF convention, defining methods for storing subsampled coordinate variables and the descriptions of the interpolation methods that should be used to reconstruct the entire (uncompressed) coordinate variables. I've reviewed this in detail and it makes sense and seems to clear to me, as someone who's never used these methods. Those who wrote this proposal are the experts. Enough support has been expressed for this proposal to be adopted, after allowing the time prescribed for the rules for further comments, and there are no objections expressed.

Therefore this proposal is on course for adoption in the next release of the CF convention as things stand. If anyone else who wasn't involved in preparing it has the time and interest to review it, that would no doubt be helpful, and now is the time to do that, in order not to delay its approval. It definitely requires careful reading and thinking, but it's logical and well-illustrated.

Best wishes

Jonathan

AndersMS commented 3 years ago

Dear @JonathanGregory ,

Thank you for your rich set of comments and suggestions. I have provided replies below, in the same format we used for the first set of comments. Several of the replies I have already implemented in the document and indicated the corresponding commit. For others, the reply is not conclusive and if you find time, your feedback on the reply would be valuable.

The comments on the conformance chapter, I would prefer that @davidhassell look at when he is available again.

Best regards, Anders

Comment/Proposed Change 17

Chapter 8: "Tie point mapping attribute" mentions "target dimension", which is not a phrase used elsewhere. Should this be "interpolated dimension"?

Reply to Comment/Proposed Change 17 You are right, it should be "interpolated dimension" in that section. I have updated the text. Commit(s) related to Comment/Proposed Change 17 ca81618

Comment/Proposed Change 18

Chapter 8: You say, "For the purpose of bounds interpolation, a single bounds tie point is created for each coordinate tie point, and is selected as the vertex of the tie point cell that is the closest to the boundary of the interpolation subarea with respect to each interpolated dimension." I don't understand why there is a choice of bounds tie points, because there's no index variable for them. Doesn't the tie point index variable dictate the choice of tie points for bounds?

Reply to Comment/Proposed Change 18 In the compressed data set we only store one bounds tie points per coordinate tie point. However, in the existing boundary variable defined in section 7.1. Cell Boundaries, requires you to store, in the case of 2D bounds for example, fours bounds. The selection is between those four bounds, of which only one is the correct selection. Text updated based on your feedback Commit(s) related to Comment/Proposed Change 18 a30c58f

Comment/Proposed Change 19

Appendix J: The title says Appendix A. Presumably that's something to do with automatic numbering.

Reply to Comment/Proposed Change 19 Correct, that will be updated as part of the publishing magic. Commit(s) related to Comment/Proposed Change 19 None.

Comment/Proposed Change 20

Appendix J: All of the subsections listed at the start (Common Definitions and Notation, Common conversions and formulas, Interpolation Methods, Coordinate Compression Steps, Coordinate Uncompression Steps) should have subsection headings, I think. They will be Sections J.1 etc. At the moment the last two are labelled as Tables J.1 and J.2 rather than subsections, but they're never referenced as tables.

Reply to Comment/Proposed Change 20 I agree and have introduced section numbering and removed table captions in Appendix J. Commit(s) related to Comment/Proposed Change 20 f6f48fb

Comment/Proposed Change 21

Appendix J: Fig 1ff. s is explained beneath the fig, but it would be useful it explain it at the side of the fig as well, as you do for tp i and u.

Reply to Comment/Proposed Change 21 Done. Also, I have named s as the “interpolation argument”, which I think is what it is. Commit(s) related to Comment/Proposed Change 21 1002806

Comment/Proposed Change 22

Appendix J: Also, it would be useful to put the paragraph explaining notation before Fig 1, because Fig 1 uses the notation.

Reply to Comment/Proposed Change 22 Agree. Done. Commit(s) related to Comment/Proposed Change 22 0fdc7e4

Comment/Proposed Change 23

You say, "When an interpolation method is referred to as linear or quadratic, it means that the method is linear or quadratic in the indices of the interpolated dimensions." Linear also means that the coordinates of the interpolated points are evenly spaced, doesn't it; if so, that would be helpful to state. Appendix J:

Reply to Comment/Proposed Change 23 The answer is a bit tricky. If the coordinates are latitude and longitude, then the steps in each of these coordinates on its own will be evenly spaced. However, the points that the combined latitude/longitude describe on the reference ellipsoid will in general not be evenly spaced, only in some special cases, like along a meridian. The easiest place to visualize the non-evenly spaced points is around one of the poles, but it applies globally.

Actually, the best of our current methods to generate evenly spaced coordinate points is the "quadratic_remote_sensing" method. It can utilize its quadratic terms to counteract the distorting effect of the latitude/longitude coordinates.

Commit(s) related to Comment/Proposed Change 23 None (5f9ad9a, 06dac3a (reverts 5f9ad9a))

Comment/Proposed Change 24

Appendix J: You say, "In the case of two dimensional interpolation, the two variables are equivalently computed as ...". I would say "similarly", not "equivalently", which I would understand to mean that s1 and s2 are equivalent.

Reply to Comment/Proposed Change 24 Done. Commit(s) related to Comment/Proposed Change 24 0116283

Comment/Proposed Change 25

Appendix J: It would be better not to use c for the coefficient, because it can be confused with the point c.

Reply to Comment/Proposed Change 25 Agreed and renamed from “c” to “w”. Also renamed related function fc() to fw(). Commit(s) related to Comment/Proposed Change 25 ea474a5

Comment/Proposed Change 26

Appendix J: Please put the "Common conversion and formulae" table before the interpolation methods, or at least refer to it. Otherwise the reader encounters fdot fcross fplus fminus fmultiply etc. without having seen their definitions. Actually you list it before the interpolation methods in the preamble.

Reply to Comment/Proposed Change 26 Good point, section moved. Commit(s) related to Comment/Proposed Change 26 4efef82

Comment/Proposed Change 27

Appendix J: [bi_]quadratic_remotesensing. Why not call it [bi]quadratic_latitude_longitude, which describes the method, rather than its typical application? What does it mean to treat them as Cartesian or not? I would describe bilinear interpolation in lat,lon as treating them as Cartesian coordinates, but you must mean something different. Is there a projection plane involved?

Reply to Comment/Proposed Change 27 We hope to find time after this review to prepare a small paper or a presentation that provides more insight into the interpolation methods and their performance.

You are right, it is like a projection plane, but we are using 3D cartesian coordinates. The problem we are addressing is that interpolating directly in latitude/longitude is inadequate when we are close to the poles. So, we temporarily convert the four tie points from lat/lon to xyz, do the interpolation and then convert the result back from xyz to lat/lon. Another common way to address this problem is to project the lat/lon point on the xy plane, do the interpolation and project the point back to lat/lon. However, by using xyz, we can also solve the problem that arises when our interpolation subarea crosses +/-180 deg longitude.

Let me try to support the above with a simple example (hoping that I am not upsetting anybody with such a simple example...)

Think of a hypothetical remote sensing instrument that scans the Earth in a way that can be approximated as arcs of a great circle on the Earth surface. So, if the instrument scans from point A to point B, then the points it scanned between A and B will be on the great circle between A and B. It will follow this simple principle for any location on Earth.

If you are near Equator and A = (0W, 0N) and B= (4W, 4N), then you can generate three points between A and B by interpolating in longitude and latitude separately and will get (1W, 1N), (2W, 2N) and (3W, 3N), which are approximately aligned with the great circle arc between A and B.

If you are near the North Pole and A = (0W, 88N) and B= (180W, 88N) and do the interpolation in longitude and latitude separately, you will get (45W, 88N), (90W, 88N) and (135W, 88N), which are on an arc of a small circle and is the wrong result. By first converting to cartesian coordinates, then interpolating and then converting back to longitude latitude, you will get the correct result: (0W, 89N), (0W, 90N) and (180W, 89N), which are on a great circle.

That was also why we suggested the name with [bi_]quadratic_remotesensing. We agree to change the name to [bi]quadratic_latitude_longitude too (done). I have changed caartesian to three-dimensional cartesian as you suggested.

Commit(s) related to Comment/Proposed Change 27 d9436e4 af2a2ea 4cc00cb

Comment/Proposed Change 28

Appendix J: Where is latitude_limit defined?

Reply to Comment/Proposed Change 28 It is a value to be decided by the creator of the data set, as a trade-off between speed and accuracy, considering that the conversions to and from cartesian coordinates takes longer. Practically, it will be in the order of 85 degree of latitude. I could add that. But I suggest that we also discuss it in the group during our meeting on Tuesday. Text improved.

Commit(s) related to Comment/Proposed Change 28 546d288

Comment/Proposed Change 29

Appendix J: A couple of times, you write, "For each of the interpolated dimension". There should be an -s.

Reply to Comment/Proposed Change 29 Corrected. Commit(s) related to Comment/Proposed Change 29 303aaa4

AndersMS commented 3 years ago

Dear All,

Here are the links to the easy-to-read versions including all the above changes:

Anders

AndersMS commented 3 years ago

Dear @JonathanGregory,

Just to let you know that I just updated my reply to Reply to Comment/Proposed Change 23 above.

Anders

JonathanGregory commented 3 years ago

Dear @AndersMS

Thanks for your detailed replies. I think there are only two outstanding points in those you have answered.

18: Now I understand what you mean, thanks. To make this clearer to myself, I would say something like this: Bounds interpolation uses the same tie point index variables and therefore the same tie point cells as coordinate interpolation. One of the vertices of each coordinate tie point cell as chosen as the bounds tie point for the cell. For 1D bounds, the vertex chosen is the one which is on the side closer to the boundary of the interpolation subarea. For 2D bounds, the vertex chosen is the one which is closest to the boundary of the interpolation subarea, considering all the interpolated coordinates together, or in other words, the one closest to the corner of the interpolation subarea.

Are you restricting the consideration of 2D bounds to rectangular cells, or are polygons of n vertices allowed?

27: I think the key point is that you mean three-dimensional Cartesian interpolation. I didn't think of that. If you could clarify this, it would be fine.

Cheers

Jonathan

davidhassell commented 3 years ago

Dear @JonathanGregory, @AndersMS, and all,

Conformance

For "Each tie_point_variable token specifies a tie point variable that must exist in the file, and each interpolation_variable token specifies an interpolation variable that must exist in the file," I think all you can say is that there are variables of these names in the file, since a checker can't tell they are definitely the "kind" of variable you intend.

Regarding, "The legal values for the interpolation_name attribute are contained in Appendix J," it would be helpful for the author of the checker to say where they can be found in the appendix.

Addressed in https://github.com/AndersMS/cf-conventions/pull/21/commits/8b8c1850321077711e8d786c53625d1aae60a042

I have also added some conformance requirements and recommendations for bounds tie point variables: https://github.com/AndersMS/cf-conventions/pull/21/commits/bdac108def687b64db8693aa4f1e50d45c120ce0

Thanks, David

erget commented 3 years ago

Dear @JonathanGregory et al.,

Due to the heroic contributions primarily of @AndersMS and @davidhassell as well as the expert review of @oceandatalab and friends we can present to you the now-finalised version of the pull request associated with this issue.

To see all points listed and addressed one-by-one you can check https://github.com/cf-convention/cf-conventions/issues/327#issuecomment-885811067 hopefully that is traceable.

We have completed our proposal, finalising the section regarding computational precision - this is now found at the end of chapter 8.3.

https://github.com/cf-convention/cf-conventions/pull/326 contains the documents in their latest state, which I have also attached in a compiled form for your perusal:

Note before finalisation of this version of the Conventions the following items will need to be addressed; these are however of a purely editorial nature so in the interest of time we are not correcting them for the 3 week freeze:

Fix numbering and labelling of figures
Fix image scaling so that the PDF doesn't make you squint This will need to be fixed globally before publishing so that figures don't get squeezed in.

A clever idea here would be to name e.g. the first figure in chapter 7 "Figure 7.1" so that the figures are always numbered correctly independently of previous chapters. I leave this to future minds to solve.

I therefore thank all contributors again for the loads of precise and hard work, and motion that the 3 week period start for this proposal so that we are on time to get it adopted into CF-1.9.

I look forward to hearing hopefully a resounding silence in response to the finalised proposal!

JonathanGregory commented 3 years ago

Dear @AndersMS @davidhassell @erget @oceandatalab and collaborators

Thanks for the enormous amount of hard and thorough work you have put into this, and for answering all my questions and comments. I have no more concerns. Looking through the rendered PDF of App J, I see boxes, probably indicating some character which Chrome can't print, in "Common Conversions and Formulas", after sin and cos.

If anyone else would like to review and comment, they are welcome to do so. If no further concerns are raised, the proposal will be accepted on 24th August.

Cheers

Jonathan

AndersMS commented 3 years ago

Dear @JonathanGregory ,

Regarding the interpolation of bounds, you asked:

Are you restricting the consideration of 2D bounds to rectangular cells, or are polygons of n vertices allowed?

We are restricting the interpolation of bounds to contiguous cell bounds. I think that the consequence of this is that we are are restricting the consideration of 2D bounds to rectangular cells. Possibly @davidhassell can confirm.

What we do support is interpolation of 1D, 2D, etc bounds. Hence the sentence:

One of the vertices of each coordinate tie point cell is chosen as the bounds tie point for the cell. It is selected as the vertex of the tie point cell that is the closest to the boundary of the interpolation subarea with respect to each interpolated dimension.

that applies for any number of interpolated dimensions.

Cheers

Anders

AndersMS commented 3 years ago

Dear @JonathanGregory,

Once again, thank you very much for your thorough review and valuable comments, which significantly improved the proposal.

Cheers

Anders

AndersMS commented 3 years ago

Dear @JonathanGregory ,

We have just discussed the matter of the cell bounds interpolation and the question you raised .

To make the conditions for bounds interpolation clearer, we have changed (b10fb67) the first part of the first paragraph in the section on bounds interpolation to:

Coordinates may have cell bounds. For the case that the reconstituted cells are contiguous and have exactly two cell bounds along each interpolated dimension, cell bounds of interpolated dimensions can be stored as bounds tie points and reconstituted through interpolation.

We hope you are fine with that change.

Best regards, The interpolation team

JonathanGregory commented 3 years ago

Dear @AndersMS

Thanks for the clarification. That's fine. The proposal will be approved next Tuesday 24th if no further concern is raised. [edited twice - I was accidentally reading the calendar for next month]

Best wishes

Jonathan

JonathanGregory commented 3 years ago

Dear @AndersMS @erget et al.

I would be pleased to merge the pull request and close this issue, but I see that the PR has conflicts which have to be resolved. I expect there is some GitHub incantation which you can pronounce to resolve them.

Best wishes

Jonathan

erget commented 3 years ago

@JonathanGregory @AndersMS et al., chanting is all done and the merge is complete. Thanks all for your many varied contributions - this was a lot of work on all sides and my hope is that it proves useful to both data producers and consumers moving forward!

JonathanGregory commented 3 years ago

Congratulations and thanks to all who contributed to this successful piece of work.

cf-convention / cf-conventions

Lossy Compression by Coordinate Sampling #327

Title

Moderator

Moderator Status Review [last updated: YYYY-MM-DD]

Requirement Summary

Technical Proposal Summary

Benefits

Status Quo

Associated pull request

Detailed Proposal

Authors

Chapter 8

Appendix J

Conformance