cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
86 stars 45 forks source link

Support Swath Data in CF #269

Open ajelenak opened 4 years ago

ajelenak commented 4 years ago

Title

Include Swath Data Encodings in the CF Document

Moderator

@erget

Moderator Status Review [last updated: 2020/05/15]

Awaiting a PR implementing the text changes to the Conventions. As stated below othis proposal has been reviewed

Requirement Summary

This proposal was presented at several past CF workshops during the course of its development. It has also been vetted by a number of subject matter experts. Given that it does not require any change of the data model or the conventions current text, it probably would fit best as a new appendix, similar to the current Appendix H for Discrete Sampling Geometries.

Technical Proposal Summary

Earth Science swath data originates as electromagnetic radiation collected from a specific direction into a solid angle and then measured at a number of electromagnetic spectrum intervals. The combination of the direction, the solid angle, and the instrument data acquisition settings defines one observation. At any given instant an instrument sweeps over an area of the Earth while its platform (an object carrying such instrument) moves. Successive observations are usually combined to cover a larger portion of the Earth. When these successive observations are plotted on maps they appear to cover a swath on the Earth’s surface, hence the name for this type of data. The proposed encodings are independent from the observation method and are applicable to swath data acquired by instruments on either satellites, airplanes, or unmanned aerial vehicles (UAV).

Benefits

All providers and users of remotely sensed geoscience data from satellites, airplanes, or UAVs.

Status Quo

Swath data remains unsupported by the CF conventions.

Detailed Proposal

The proposal is at https://github.com/Unidata/EC-netCDF-CF/blob/master/swath/swath.adoc. The new text for the conventions document will be based on the content in Sections 2.3 through 2.6.

erget commented 4 years ago

@ajelenak I think this is a good idea. Since nobody's volunteered yet I'm happy to moderate the discussion but it should be noted that I am clearly in favour of it so it would be more interesting to have somebody with more concerns. At a minimum I'd like to involve some people who can throw a cautious eye on it once you've got a PR ready.

JonathanGregory commented 4 years ago

I'd like to review this but I haven't had time yet. Thanks for compiling this carefully written proposal. Jonathan

erget commented 4 years ago

@ajelenak it looks like nobody's found glaring errors yet ;) would you mind putting together a PR proposing the changes to the text so that we can begin the discussion / approval process?

hilawe commented 3 years ago

@ajelenak this is an excellent and thorough effort. At NCEI we may have more examples for you to consider that could help with further definitions. I will let my colleagues know and see what they think, if it's not too late.

Thank you for your service to the community.

gaochen-larc commented 1 year ago

This is very helpful! Thank you @ajelenak !

For a follow-up question, I am dealing with data like the TRMM case. Should I use different groups to separate data with different dimensions? For example,

Group: lores lat(time, samp_lo) lon(time, samp_lo)

Group: hires lat(time, samp_hi) lon(time, samp_hi)

Any suggestions?

Thanks!

lupemba commented 10 months ago

@ajelenak I just had a brief look at the document. The text seems focused on passive imagers. Is the Swath data format also applicable for radar data e.g. SAR or Scatterometers?

davidhassell commented 10 months ago

Hi - we have a new project starting in 2024 that will involves, as part of it, an extraction of data from climate and NWP models for comparison with L2 swaths, on the grid of the latter. Therefore my interest is piqued and I look forward to reviewing the proposal.

Thanks, David

ajelenak commented 10 months ago

@lupemba Yes, SAR and scatterometer instruments were taken into account during development. Do you have a specific example we could test with?

hilawe commented 10 months ago

For a follow-up question, I am dealing with data like the TRMM case. Should I use different groups to separate data with different dimensions? For example,

Group: lores lat(time, samp_lo) lon(time, samp_lo)

Group: hires lat(time, samp_hi) lon(time, samp_hi)

@ajelenak I wanted to piggyback on this TRMM comment and provide a sample Passive Microwave Climate Data Record file that should cover what @gaochen-larc brought up. This should cover the maximum complexity of satellite swath files.

Thank you!

lupemba commented 10 months ago

@ajelenak, The upcoming EPS-SG program is aiming to follow the CF format (where it is practicable). The test data can be found on this site and include test data for SCA (the new scatterometer) https://www.eumetsat.int/eps-sg-user-test-data The backscatter in the test data is mostly noise and have a lot of missing values but the geometry of the swath should be close to what is expected for the real data.

I would recommend looking at SCA-1B-SZF and SCA-1B-SFR. The SZF is the full resolution where each beam has its own geometry. In the SZR the backscatter is resampled to produce 5 collocated measurements on one swath (two if you want to split the left and right side.). Note that the swath is quite different for rotating fan beam scatterometers. I don't have any example of this.

image

ajelenak commented 10 months ago

@lupemba I got some sample SCA L1B files from the link you provided. The content in the SZF file /data group is in one of the possible swath formats. However, the content in the SZR file /data group is not. Below are several variables from that group in one SCA-1B-SZR file to illustrate how data are organized:

  group: data {
    dimensions:
      number_beams = 5;
      number_points = 3392;
    variables:
      double time(number_points=3392);
        :long_name = "time associated with each point";
        :units = "UTC seconds since 2020-01-01 00:00:00.000";

      int backscatter(number_points=3392, number_beams=5);
        :long_name = "backscatter coefficient (also known as NRCS or sigma0) obtained by spatial averaging the full resolution data around the grid point for the fore VV, mid VV, aft VV, mid HH and mid cross-pol channels";
        :units = "dB";
        :missing_value = -2147483648; // int
        :scale_factor = 1.0E-7; // double
        :add_offset = 0.0; // double

      int latitude(number_points=3392);
        :long_name = "geodetic latitude";
        :units = "degrees_north";
        :missing_value = -2147483648; // int
        :valid_min = -90000000; // int
        :valid_max = 89999999; // int
        :scale_factor = 1.0E-6; // double
        :add_offset = 0.0; // double

      int longitude(number_points=3392);
        :long_name = "longitude";
        :units = "degrees_east";
        :missing_value = -2147483648; // int
        :valid_min = -180000000; // int
        :valid_max = 179999999; // int
        :scale_factor = 1.0E-6; // double
        :add_offset = 0.0; // double
      uint line_index(number_points=3392);
        :long_name = "absolute grid index in along track";

      // ...

      uint line_index(number_points=3392);
        :long_name = "absolute grid index in along track";

      short node_index(number_points=3392);
        :long_name = "grid index in across track (far left swath to far right swath)";
        :valid_max = 53S; // short
        :valid_min = -53S; // short

      // ...
  }

This is not swath format because the backscatter, longitude, and latitude data are spatially stored as 1D. There is nothing wrong with this data organization but in CF this is very similar to a trajectory.

ajelenak commented 10 months ago

@hilawe Yes, your sample file is very "busy" but it is compliant with the swath proposal.

lupemba commented 10 months ago

This is not swath format because the backscatter, longitude, and latitude data are spatially stored as 1D. There is nothing wrong with this data organization but in CF this is very similar to a trajectory.

@ajelenak, I am happy to hear that the SZF format is compliant. The SZR data is actually also on a grid of 106 nodes x 32 lines (53 nodes for each side). The data have just been flatten to a 1D array for the netCDF format. I can try to ask around to hear why 1D longitude, and latitude where chosen over a 2D grid. What is the benefits of having the data stored as swaths?

semmerson commented 10 months ago

FYI, and FWIW, The UDUNITS library won't be able to parse the following:

On Tue, Jan 2, 2024 at 7:00 PM Aleksandar Jelenak @.***> wrote:

    :units = "UTC seconds since 2020-01-01 00:00:00.000";

If the "UTC" is a suffix rather than a prefix, then parsing will work, e.g.,

:units = "UTC seconds since 2020-01-01 00:00:00.000 UTC";

or

:units = "UTC seconds since 2020-01-01 00:00:00.000Z";

    :units = "dB";

Unreferenced, logarithmic units aren't supported. Referenced units are, e.g.,

:units = "dB(re 1 mW/m2)

(Probably not a realistic reference value.)

BTW, I'm now retired. I should still have access to the UDUNITS package, however.

--Steve Emmerson

ajelenak commented 10 months ago

Also, I suggest to add an appropriate coordinate for the number_beams dimension with alphanumeric identifiers of the five beams.

davidhassell commented 10 months ago

Hi @ajelenak and all,

I've read through the document a couple of times, now. I think that it's very clear and compreshensive, and the numerous examples are great - thank you!

davidhassell commented 10 months ago

(carrying on having pressed "send" too early ...)

I'd like to make a couple of general comments:

Dimension order

There are multiple occasions where there are dimensions that do not have corresponding 1-d (auxiliary) coordinate variables. E.g. atrack, xtrack, ncols, nrows, FOR, obs, scan, etc. Sometimes there order is specified ("with the slowest varying dimension representing forward (along-track) movement of the platform"), other times not, and sometimes (2.4.2. Multiband Image) it seems to imply that the order can be swapped. The dimension order is crucial for correct interpretation, so shouldn't the convention be strict and explicit about not only the dimension order, but also how to identify these dimensions?

For instance, in Example 9 we have float lat(time, FOR, obs) ; we know that time is time because it also has coordinate variable, but I don't know what FOR and obs represent.

EDIT: I do know what FOR and obs represent physically from the text, but I meant that there is nothing to distiguish them in the example file.

This is also an unsolved problem for the storage or tripolar ocean grids, for which there is no indication which dimension is "x" and which is "y", yet that information is needed to correctly manipulate the data.

Is this a CF convention?

The proposal describes how the CF conventions can be used to describe swath products using existing functionality. As such it seems more of a profile of CF use rather than an extension to the conventions themselves. Would it be better to maintain them separately and reference them from the Conventions attribute, something like Conventions = "CF-1.11 CF-swath-1.0"?

Thanks, David

lupemba commented 10 months ago

Thanks for all the inputs. I hope that I do not hijack this discussion with SCA data. Maybe another forum/thread would be more suited for this kind of discussion.

@semmerson

If the "UTC" is a suffix rather than a prefix, then parsing will work

This has already been raised at EUMETSAT and has been updated for the next release of the test data.

Unreferenced, logarithmic units aren't supported. Referenced units are,

Normalized radar cross-section (NRCS) is a unitless parameter. The units of radar cross-section is m^2 and when it is normalized by the area of the target it becomes unitless.

@ajelenak

Also, I suggest to add an appropriate coordinate for the number_beams dimension with alphanumeric identifiers of the five beams. I will bring this suggestion forward. It would be something like

variables:
string beam(number_beams) ;
beam:standard_name = "sensor_beam_identifier";

With the names being ["forVV", "midVV", "aftVV", "midHH", "midXX"]. I will also suggest renaming beam to band to better fit the convention but this is a bigger changes to the format.

taylor13 commented 9 months ago

If indeed no new attributes are needed, I would agree with https://github.com/cf-convention/cf-conventions/issues/269#issuecomment-1876787752 that this might better be documented as a profile (similar to the externally documented specifications for "cmorizing" CMIP data). Perhaps a simple example of how to handle swath data could be included with reference to the more detailed information elsewhere. [I should admit that I have not carefully studied the proposal, so hope I haven't missed some critical new extension to CF that is being proposed.]

JonathanGregory commented 2 months ago

There's clearly plenty of support for some documentation or guidance on the use of CF for swath data, but there has not been any activity on the issue since January. Is one or both of you able to take this forward, Aleksandar @ajelenak and Simon @lupemba? Thanks for your contributions.

lupemba commented 2 months ago

@JonathanGregory I am afraid I have neither the knowledge or the time to add Swath Data Encodings to the CF Document. I hope that @ajelenak has the time to complete this so we can get a more standardized approach to swath data.