cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
80 stars 43 forks source link

Convention for HEALPix grid parameters #433

Open uweschulzweida opened 1 year ago

uweschulzweida commented 1 year ago

We will soon be producing a lot of data on a HEALPix (Hierarchical Equal Area isoLatitude Pixelation) grid. The grid coordinates can be calculated easily from 2 parameters. The NSIDE parameter controls the resolution of the pixellization and the ORDER parameter sets the index ordering convention of the pixels (ring or nested). I think it would be good to have a convention for storing these parameters in NetCDF. My idea is to define these parameters via the grid_mapping attribute, for example:

dimensions:
    cells = 49152 ;
variables:
    int healpix ;
        healpix:grid_mapping_name = "healpix" ;
        healpix:healpix_nside = 64 ;
        healpix:healpix_order = "nested" ;
    float var(cells) ;
        var:grid_mapping = "healpix" ;

This defines a HEALPix grid with nside=64 and a nested index ordering. Can this be taken over into the CF Convention or do you have a better suggestion?

JonathanGregory commented 1 year ago

@uweschulzweida, thanks for your question. Maybe it can be treated as grid mapping. Does it have two horizontal coordinate variables, like the existing grid mappings of Appendix F? All of those are methods for converting between the (X,Y) coordinates of the mapping and (longitude,latitude) coordinates. Please could you describe HEALPix in the form of the other entries of Appendix F?

uweschulzweida commented 1 year ago

I have not found anything suitable in Appendix F. In any case, only the two parameters NSIDE/ORDER are needed to calculate the grid coordinates. For a HEALPix grid it does not make sense to store the coordinates in a NetCDF file. I think I will save the parameters as described above for the time being. If someone has a better idea I can change it.

JonathanGregory commented 1 year ago

You're right, HEALPix is not in Appendix F at the moment. I see from wikipedia that it is a map projection, or a class of map projections. If you could provide a definition of its parameters in the form of the other entries of Appendix F, we could certainly consider adding it.

You remark, "For a HEALPix grid it does not make sense to store the coordinates in a NetCDF file." Sect 5 of CF expects latitude and longitude coordinates to be provided as 2D auxiliary coordinate variables if the horizontal coordinates aren't latitude and longitude (as in this case). This is mandatory, whereas the grid mapping is an optional extra. The reason for it is to make the data self-explanatory and useful (the aim of the CF convention, in general), by enabling generic applications to geolocate the data. Most applications will not be aware of the HEALPix grid, for instance, but will understand latitude and longitude.

Best wishes

Jonathan

bnlawrence commented 8 months ago

I'm missing something here @JonathanGregory. When we use a map projection we provide coordinates which can only be turned into lat/lon by using information about the map projection itself.

In Healpix, the dimension index (singular) plus the nside and order are sufficient to generate the lat/lon coordinates. So it's the same as any other coordinate system (cordinates in lat/lon) = function(coordinates in projected grid) where the function you need is defined outside the CF file itself.

Section 5 says:

If the coordinate variables for a horizontal grid are not longitude and latitude, then a grid_mapping variable provides the information required to derive longitude and latitude values for each grid location. If no grid mapping variable is referenced by a data variable, then longitude and latitude coordinate values shall be supplied in addition to the required coordinates.''

Which I take to mean that given Healpix is a well known formalism for whch a grid mapping variable can be provided, and so we don't need latitude and longitude.

JonathanGregory commented 8 months ago

Dear Bryan @bnlawrence

Actually it was me who was missing something! I didn't know that the convention had been changed, such that the 2D lat and lon coordinates are no longer mandatory if the grid mapping is provided. This change was introduced in version 1.8 by issue 179, which I had never seen before this morning! I'm not disputing it, and I understand the reasons for changing it. I suspect I didn't notice it because it was proposed during the 2019 CF meeting in Tacoma, which I didn't attend in person. There may have been several things initiated then, which were of course properly followed up in issues, but since I'm quite attentive to the CF discussions, my oversight shows there's a danger when a large number of things are done at once that they may not receive the same scrutiny as usual. That's something to keep in mind.

So you're right, Healpix can be included without 2D aux lat and lon coordinates if it can be described by a grid mapping in the format of Appendix F, as discussed with @uweschulzweida. Can that be done?

Best wishes

Jonathan

davidhassell commented 8 months ago

Hello,

It's interesting to note that the 5.6 text implicitly assumes that horizontal coordinates of some type are always present. What if there are no projection coordinates?

This is potentially relevant because it's not obvious to me what auxiliary coordinates, if any, would be for the data variable's cell dimension (from the CDL in the original post). It would be great if someone clarify this point.

Thanks, David

bnlawrence commented 8 months ago

I think it's crucial to understand that the choice of order defines the domain filling curve to be used: Screenshot 2023-10-17 at 15 06 02 This is not as bad as it seems as there is a clear functional relationship between the pixels in both schemes and their latitude and longitudes. Cell bounds might be more fun.

(Figure from Górski et al, 2004, https://iopscience.iop.org/article/10.1086/427976/pdf, shows pixelation for nside=2)

bnlawrence commented 8 months ago

(Incidentally there is a lot of confusion in people talking about healpix in our community because at the same time as introducing healpix, people are introducing hierarchical datasets, that is, storing lots of versions of the same variable at different pixelations, so that folks wanting low resolution data can read a lower resolution pixelation. E.g. in practice folks talking about storing multiple zoom levels where $\mathrm{cells} = 12 \times 4^z$, $z$ is the zoom level, and $\mathrm{nside}=2^z$)

davidhassell commented 8 months ago

Thanks, @bnlawrence, I now understand the suggested healpix:healpix_order = "nested" in the CDL.

So there are no auxiliary coordinates for the cell dimension, then. I suppose that's OK, as it's not really any different to, say, having easting and northing coordinates for a transverse mercator projection - they're not much use unless you go through the laborious mathematical procedure to work out the longitudes and latitudes.

bnlawrence commented 8 months ago

@uweschulzweida I see that the zoom level seems pretty fundamental to how DKRZ are using healpix. Do you think the zoom level should be in the attributes directly, rather than say, nside?

davidhassell commented 8 months ago

So there are no auxiliary coordinates for the cell dimension, then.

... or would you store the latitudes and longitudes as auxiliary coordinates, with the grid_mapping there to show provenance (and to define the cell edges)

uweschulzweida commented 8 months ago

@uweschulzweida I see that the zoom level seems pretty fundamental to how DKRZ are using healpix. Do you think the zoom level should be in the attributes directly, rather than say, nside?

The zoom level can be easily calculated from the nside parameter. I don't think this should be stored as an attribute, since the zoom level is only applied to nested ordering.

bnlawrence commented 8 months ago

Well, zoom level and nside are intimately related right: $\mathrm{nside} = 2^{z}$?

Why do you say it is only applied to nested ordering? My reading of the DKRZ documentation suggests it is being used as a query parameter into a dataset with a number of variables each with a different zoom level. It seems to me that the "interesting" information from the metadata point of view is the zoom level, not the nside parameter as it's not directly related to the way variables are chosen.

uweschulzweida commented 8 months ago

Yes, the zoom level is important for us! We store the data in zarr archives. Each zoom level is a separate dataset with all variables. The name of the zarr archive contains the zoom level that is accessed via the query request.

bnlawrence commented 8 months ago

Given that, it feels like it would be more useful to expose the zoom level in the CF metadata, which means that tools which harvest that metadata can use it, and of course, the tools which read the data can trivially calculate nside from the zoom level when attempting to use code that needs that.

(Obviously it's trivial both ways, but I figure the representation which needs least conversion should be the one that is harvested to catalogs and/or visible when lazily loading.)

bnlawrence commented 8 months ago

I said earlier that cell bounds might be fun. The advantage of Healpix is that the cells are equal area, which is obviously useful for regridding (zooming) in a healpix grid, but not so good if we wanted a conservative regrid onto another coordinate system (or indeed conservative regridding to a healpix projection). But it's not an impossible calculation and we presume there are libraries that do it. This figure is helpful: Screenshot 2023-10-18 at 12 45 16

bnlawrence commented 8 months ago

Just been discussing what we might put in Appendix F with @davidhassell; suffice to say we'd need a reasonably complete description of what is going on with HealPix. We'd also want, we think, to recommend (but not mandate) the use of two 1D auxiliary coordinate variables with the lat/lon of the pixel centres so as to simplify data usage by those that don't want to utilise healpy or equivalent.

sebvi commented 8 months ago

just discovering this discussion now.

We actually did the same exercise to add the HEALPix grids into WMO GRIB2 standard (I suspect for the same project than the one reported by @uweschulzweida ): GRIB2 issue

I have 2 comments: 1) the restriction nside = 2^k is only valid for the nested ordering because of the way this ordering works. But it is not a requirement for ring ordering and in fact we use a nside that is not a power of 2 at ECMWF. It is perfectly fine to have 12 diamonds area that are 5 by 5, 9 by 9, 12345 by 12345, etc. It is particularly useful when you go to high resolution because the number of points (pixels) can quickly explode and one might not want to go from nside=1024 to nside=2048 or nside=4096 as a model resolution could increase. Sure you loose the "zooming" feature but it is not always a requirement of your workflow or application. This is why we decided to go with encoding nside in the GRIB2 metadata rather than the zoom level k. 2) Strictly speaking, nsides and the ordering type are not sufficient to decode the data. nside will tell you how many iso latitudes and how many points on each iso latitude you have but you will need the longitude of at least one point to relatively compute the other points wrt it. Sure, you could impose one like it is done in many tools implementing HEALPix grids but you loose flexibility. That said choosing or not a reference longitude is conceptually just a rotation around the north/south pole rotation axis and could possibly be handled using extra rotation metadata (but I'm not sure).

In the GRIB2 proposal we also added extra keys to specify if the observable is valid at the center of the pixel, at edges, etc. You may want to do that as well (although I must admit I would need to read again how grids and projections are handled in CF). At the end I would welcome something as close as possible to what is done in GRIB2 for the sake of interoperability and format conversion, in the limits of what is usually done in CF of course.

bnlawrence commented 8 months ago

Thanks @sebvi, really glad you've jumped in here!

  1. My reading of "the" HealPix paper is that footnote 11 is intended to be part of "the" definition. However, given in practice ECMWF doesn't feel bound by that, and given as a consequence GRIB2 doesn't, it'd be silly for CF to hold to it, and hence we had better use nside too.
  2. It's pretty clear that the default expectation is for the longitude to be 0, but we've been burnt with defaults, so I think we too should require that.

There is a lot of other material in your various templates, is any of that relevant here? I note you also have table entries for edges and vertices. Is that mandatory or have you added that so that the information can be provided if desired?

sebvi commented 8 months ago

ECMWF has an (indirect) interest in seeing this going forward as many of our end users might want to retrieve the data in HEALPix GRIB2 and then convert to HEALPix netCDF to then use with their preferred tools. The closer the metadata the easier the conversion. :)

  1. one of the reasons not to conform to the rule nside=2^k is mostly its lack of flexibility. basically nside can only take the values 1, 2, 4, 8, 16, ... which is a limited set of resolutions not necessarily close to the sort of resolutions we usually run our models now and in the future. This is why we use mostly the ring ordering because then we can choose a resolution as close as possible to the resolution we are running when not using HEALPix. When interpolating from/to HEALPix to/from a non HEALPix grid, not bound by the 2^k rule, it has some implications as well.
  2. We actually use a default of 45 degrees for the reference longitude.... The reasoning behind is that if you look at the first picture you posted of the ordering schemes, you will notice that pixel 0 of diamond 0, in both cases, is at 45 degrees. I am not sure it is the default in popular tools and in particular in healpy but it was the one that made sense to us. But as long as you define an optional longitude reference with a sensible default value if not present, you are covered. The other reason why we included the possibility to specify that reference longitude is that we know by experience that it is only a matter of time until someone request the feature, likely because their area of interest falls between 2 or more diamonds and they would rather define the points to have their area in only one diamond. I could also see countries in diamonds 4 to shift everything so that diamond 4 is not split, i.e. to have the points contiguous (this would only matter in ring ordering obviously)

The extra keys in GRIB2 were added for several reasons but mostly to be covered for future requests:

EDIT: I 've hit the button too quickly

bnlawrence commented 8 months ago

The closer the metadata the easier the conversion. :)

I'm pretty keen that we don't introduce unnecessary semantic mismatches. I don't see any reason for doing so here.

I think we can build a proposal based on this. I think the important point is that we would only need one mandatory additions to the original CDL (i.e. something like a mandatory healpix:healpix_Lo = Float).

I am in two minds as to the necessity for scanning order in netCDF, what do others think? If we did it, we could do it with optional additional parameters?

I think we can optionally use auxiliary coordinates for the vertices etc. Will think more about that.

bnlawrence commented 8 months ago

Incidentally the document produced for the GRIB appendix provides a useful intro to all this for newcomers!

taylor13 commented 8 months ago

eyes=thanks!

larsbarring commented 8 months ago

@bnlawrence asks

I am in two minds as to the necessity for scanning order in netCDF, what do others think? If we did it, we could do it with optional additional parameters?

From my outside perspective I think that transferring the concept of [different] scanning orders from GRIB to netCDF is not helpful.

In the GRIB "world" I do see the need and underlying reasons (as @sebvi hinted at) where GRIB is essentially a machine-to-machine format where scanning order is a well established fact. This is not the case for netCDF, which is a data format widely used across rather different communities. And scanning order would be a new concept in the netCDF "world" (where I, like @sebvi, have questions regarding the actual need/use case, e.g. with respect to efficiency). Across the different communities using netCDF there is already widespread confusion regarding X, Y, Z vs. latitude, longitude (see here for a rundown of anecdotal evidence).

Hence, I think that unifying GRIB different scanning orders into what is established in netCDF is a perfect task for GRIB-to-netCDF converter tools.

JonathanGregory commented 5 months ago

Dear Sebastien @sebvi, Bryan @bnlawrence et al.

Thanks for the discussion on this issue, on a convention for HEALPix in CF. Is someone in a position to make a definite proposal, to take it forward?

Best wishes

Jonathan

davidhassell commented 1 month ago

Hello,

@sebvi wrote:

In the GRIB2 proposal we also added extra keys to specify if the observable is valid at the center of the pixel, at edges, etc. You may want to do that as well (although I must admit I would need to read again how grids and projections are handled in CF).

I was wondering what the use cases might be for storing data on edges and vertices. It would seems to me that you'd only want to do this if there are well defined mappings between the edges/vertices and the pixels (faces) themselves - is that the case? You could of course use UGRID to to store such information, but then the structure is lost, which seems counterproductive!

Thanks, David