Clarifying the temperature unit associated with some cell methods and standard names

cf-convention / discuss

A forum for proposing standard names; and any discussion about interpretation, clarification, and proposals for changes or extensions to the CF conventions.

42 stars 6 forks source link

Clarifying the temperature unit associated with some cell methods and standard names #101

Closed larsbarring closed 5 months ago

larsbarring commented 3 years ago

Associated with all standard names is a canonical unit, which in case of temperature is Kelvin. And Appendix E explains how these canonical units are transformed by the different cell methods. For temperature this currently becomes fragile for the combined effect of two aspects of CF:

CF does not make a (clear) distinction between absolute temperature (irrespective of unit) and relative temperature (cf. #77). Instead this has to inferred from a detailed interpretation of standard names and/or cell methods.
CF does not mandates use of the canonical unit, but instead 'delegates' unit handling to udunits (cf. 1.3 Overview and 3.1 Units

And udunits, that accepts Kelvin and degree_Celsius (and others) temperature units, can not make a distinction between a measure of absolute temperature and relative temperature. This means that software do need to make a rather involved interpretation of phrases of the standard name in combination with cell methods (in fact this has to be done irrespective of whether udunits is used or not) and whether or not to apply the additive constant 273.15 when converting between Kelvin and degree_Celsius.

While I would have liked to CF to make a clear distinction between measures of absolute and relative temperature, this may not be viable given the overarching goal to not break backward compatibility. As a second best alternative I suggest that text is added to clearly spell out which phrases of standard names and cell methods are to be used to make the distinction between absolute and relative temperature. Collecting this information at one place alerts and helps users general and in particular software designers. This text could be added under Section 3.1 (new subsection?) or in Appendix E, cross-referenced where appropriate.

So far I have come across the following phrases that I believe indicate a relative temperature:

standard names: difference, anomaly, change, tendency, bias
cell methods: range, standard_deviation, variance. ~~, and perhaps root_mean_square, sum_of_squares~~ For some of the standard names it may be obvious from the context that in practice the unit is always Kelvin, but for several I would say that either Kelvin or degree_Celsius is equally probable.

EDIT 2021-10-21: After more careful consideration root_mean_square and sum_of_squares do not involve differences, so deleted. /LB

martinjuckes commented 2 years ago

@larsbarring : on the issue of backward compatibility, I believe that the protocol allows us to replace existing standard names with more precise terms when a use case emerges that needs more precision that provided by existing names.

The references to Udunits really needs to be fixed. Udunits2 (Udunits version 1, which is cited in the convention, is no longer supported) assumes that quantities measured in Kelvin and Celsius are absolute on the respective scales, and quantities in powers of Kelvin and Celsius are relative amounts.

In the ISO 80000 standard on Quantities and Units has distinct quantities for thermodynamic temperature and Celsius temperature, not considered as interchangeable. The ISO 80000 units kelvin and degree Celsius are equivalent -- the specification of the offset occurs in the definition of the two distinct quantities. SI takes the same approach: thermodynamic temperature is a fundamental quantity, Celsius temperature is a derived quantity (see p. 138 of the SI Brochure).

I can see a number of options:

The status quo: undesirable and open to confusion if software is not aware of the need to treat temperature units as a special case;
Conform to Udunits2: this should be considered, because it is implied by the existing text --- but it does not look like a good option to me.
Provide some guidance to reduce the scope for confusion, as suggested in Lars's post.
Conform to SI and ISO 80000 by introducing some new standard names.

Conforming to UDUNITS does not look advisable, because there is no guarantee that features such as this will remain unchanged in future releases.

I like the idea of conforming to SI and ISO 80000. In other areas, such as the discussion of the use of kg CO2e as a unit by the climate impacts and adaptation community, we insist on a pure interpretation of units. Allowing an ad hoc interpretation of units for air_temperature, in which the units value is also interpreted as specifying the scale, is inconsistent.

larsbarring commented 2 years ago

I have come to realise that maybe I used the wrong words: what I meant with a relative temperature is really a temperature difference where Δ°C = ΔK. And with absolute temperature I meant temperature where 1°C = 274.15K.

Now over to @martinjuckes comments and suggestions.

On backwards compatibility: yes we can (and should) update standard names when new more precise terms are needed. But what I meant was more related to Appendix E, which details the unit conversions associated with cell methods. Here we have, for example cell method standard_deviation where it is is specified that no unit transformation take place. Yet, for temperature we are transforming a temperature to a temperature difference. As far as I have read the Conventions documents I have not seen any distinction between these.
I believe the references to UDUNITS have been fixed in CF 1.9. In the initial comment I have now changed links to this version.
Yes, Kelvin and degree_Celsius are equivalent, but they are 'equivalent in different ways' depending on whether it is a temperature or a temperature difference. As Martin writes, SI acknowledges this difference, but UDUNITS does not to my knowledge.

Regarding Martin's four options:

Status quo is not good. I do agree with Martin's conclusion that this would be inconsistent.
Neither is conforming to UDUNITS2 a solution, because UDUNITS2 does not know whether a quantity represents a temperature or a temperature difference.
Yes, this is the minimal solution. But maybe I was, for reasons of imagined backward incompatibility, too early ruling out the best solution (as I now see it): Clearly spell out in Table E.1 that certain methods involves a transformation from u to Δu, and make Δu the canonical unit for the relevant standard names.
Maybe we do not need new standard names if we are just making a more precise statement regarding the canonical unit?

@japamment: maybe part of this (point 3 & 4) has some bearing on the outcome of CF workshop discussion on canonical units

JonathanGregory commented 2 years ago

Dear @larsbarring and @martinjuckes

I think it would be sufficient to describe the issue in the definitions of standard names, where appropriate. The important practical point is that a different rule is needed for units conversion when it's a temperature difference, as Lars says. I would say that's a practicality rather than a "fundamental" distinction. There are similar issue with other quantities. For example, you can't convert degree_east to degree_north, although they are canonically equivalent. A metre is always the same size, but a vertical coordinate of 1 m for height (above the surface) and altitude (above the geoid) are not the same.

I do not think we need new standard names, however. The existing standard names recognise that a temperature difference is a different quantity from a temperature, using various phrases, as Lars has identified. That's fine. I think they are canonically the same. Equilibrium climate sensitivity is a temperature difference, for example, which is sometimes written in K and sometimes in degC. It depends on the author's preference and the intended audience. Temperature are also temperature differences, really; temperatures in degC are differences from freezing point, temperatures in K are differences from absolute zero.

Best wishes

Jonathan

larsbarring commented 2 years ago

Dear @JonathanGregory

I am not sure that I follow what you are aiming at with the examples in the first paragraph:

degrees_north and degrees_east are units for quantifying different "things". That is they are associated with different variables having different standard names. True, one can convert between them by adding/subtracting 90 degrees, in which case it becomes obvious that 1 degree_north only is canonically the same as 1 degree_east in a 'relative' sense, i.e. as a difference from their individual origins.
A metre is always the same, and thus I argue that a vertical coordinate of 1 m for height (above the surface) and altitude (above the geoid) are in fact the same quantity but related to different variables. This is clear as they are defined as measuring the difference from their individual origins that are defined by the standard name. That is, by definition they do not represent the same position in 'absolute' space, e.g. when measured as a distance from the centre of the earth (however that is defined).

Both these examples focuses on units/variables that have different uses/meanings that are not directly comparable to the case for temperature. Moreover, they both happens to be (spatial) coordinate variables rather than 'ordinary' data variables. The key thing in these examples is, as with temperature, that if the data is transformed to a common origin it becomes obvious that the origin is important. To my knowledge, for 'data variables' this is only relevant for temperature, which has units where one alternative involves an additive constant.

All this tells me that Table E.1 should be updated to specify if a method involves a transformation u -> Δu. As it stand now it is not correct to state that all methods involves a unit transformation u -> u, at least as long as CF delegates unit handling to UDUNITS.

Regarding the second paragraph, I fully agree with Jonathan that no new standard names are needed. I should be enough with more details. One part of this is adding text to the description. But given the fact that the xml version of this table is increasingly used directly in automated workflows and imported into analysis software I suggest adding a new column ('to the right') flagging whether a standard name's unit is associated with an 'absolute' quantity or a difference quantity. If it turns out to be complicated to draw the line this flag could be set only for standard names related to temperature (and any other units where an additive constant may be important).

To resolve this issue by just making text changes to relevant standard name descriptions would be missing a good opportunity to make the Conventions more clear. It would be very easy for a reader to miss that a standard name description has changed, especially if we do not deprecate the standard name. And in automated use of the the xml version such a change would almost certainly go unnoticed.

Kind regards, Lars

JonathanGregory commented 2 years ago

Dear @larsbarring

I think the example with height and altitude is interesting to compare with temperature in K and degC. The only difference is the reference, as you say, in both cases, but with the vertical coordinate we give them different standard names. I think that is because the difference in reference is not a numerical constant. I don't think we would have different standard names for instance for height above the geoid (i.e. altitude) and height above a surface 100 m above the geoid.

Time is like temperature in that it is sometimes a difference (as in an interval of time) and sometimes a coordinate (meaning a difference wrt a reference time). In the latter case, we always specify since the reference time in the units, and that makes the distinction. For temperature, the units don't help us, because K and degC imply their reference temperature (when it's not a difference).

I would also remark that any quantity which serves as a coordinate might sometimes be a data variable and vice-versa.

The above points are just a discussion! They don't help us to decide what to do to indicate the distinction between temperature and temperature difference in a machineable way. I think your question is relevant of whether temperature is the only such case. That will help us to decide whether we need a general solution.

Best wishes

Jonathan

larsbarring commented 2 years ago

Dear @JonathanGregory I think that there are two aspects of temperature that neither height nor altitude capture:

It is physically meaningful to have negative values i.e. all these are always differences (leaving out the issue of what happens when the negative value is large enough to pass beyond the centre of the earth). Thus these quantities are already by definition Δu. Only if height is measured for the centre of the earth there is correspondence with the absolute Kelvin scale.
Your example geoid and geoid+100m seems to work very well, but that is only because we just have one and the same unit, meter, for both. If there was a separate unit for the latter, say meter_100 it would be closer to the problem with temperature.

Let me try to capture both aspects in one example based on height: we define the standard name elevation. For historic reasons we have in common use the unit meter_Celsius that use as origin the floor of Anders Celsius' workplace. However modern science have established that earth centre is is exactly 6.3781×10^6 meter below this floor, and in sciences it has become common to use earth's centre as origin for elevation. And to not mix things up the new unit is called meter_Kelvin to distinguish it from meter_Celsius, where Δmeter_Celsius = Δmeter_Kelvin.

I believe this captures the essence of the problem. One units is absolute, i.e. it has the origin at 0 and it is not physically meaningful to have negative values, the other is relative an origin ≠ 0, which means that both positive and negative values are possible. And Δu is exactly the same for both units.

Let me come back to your earlier examples

Equilibrium climate sensitivity is a temperature difference, for example, which is sometimes written in K and sometimes in degC. It depends on the author's preference and the intended audience. Temperature are also temperature differences, really; temperatures in degC are differences from freezing point, temperatures in K are differences from absolute zero.

-- It would indeed be a mistake if one were to use UDUNITS to harmonise the units of ECS data having mixed units K and degC. -- degC is indeed a temperature difference. -- I only half agree (literally) that temperatures in K are differences from absolute zero. To me this does not capture that the Kelvin scale does not have negative values. To make it clear I would rather say that temperatures in K quantifies the distance from the absolute zero.

Now, returning to your most recent comment:

Time is like temperature in that it is sometimes a difference (as in an interval of time) and sometimes a coordinate (meaning a difference wrt a reference time). In the latter case, we always specify since the reference time in the units, and that makes the distinction. For temperature, the units don't help us, because K and degC imply their reference temperature (when it's not a difference).

-- Yes, as we now use time it is always a relative measure (i.e. difference). But in principle we could find an absolute starting time, e.g. when the Big Bang happened (BB) . According to Wikipedia the best estimate is 13.772 10^9 years ago. Although it is only an estimate, we could in principle adopt this as an absolute zero for a time scale to be used in CF. Or we could be less general and fix a time when the earth was formed (EF). Both would be suitable as absolute zeros for geophysical applications. I am definitetly not suggesting this, but it shows that in principle we could have units second_BB or second_EF as an absolute scale (no negative numbers) and then relative scales having units second_BCE, second_19700101000000* and what not.

martinjuckes commented 2 years ago

@larsbarring , the suggestion that "1°C = 274.15K" (above) in CF whereas "1°C = 1K" is the rule in SI and ISO standards illustrates what I would consider to be a fundamental difference between the two approaches. In the one case, the "unit" is being used to define an amount, in the other it is being used both to define an amount and a reference value. As it stands, our "degree Celcius" has a very different meaning to the SI unit degree Celcius.

I'm not sure what to make of the discussion of degrees_east and degrees_north -- there appears to be some uncertainty about what is meant by "conformance". From a UDUNITS perspective they are interchangeable, which makes sense if you accept that "east" and "north" are directions rather than units ... the units of measure in both cases are degrees (though the rationale for 1 degrees_west = -1 degrees_north is a little harder to unpick). The problem appears to be that here, as with temperature, CF is following an approach which pack additional information into the unit string which goes beyond the unit of measure, and is doing it in a way which has not been clearly explained. The discussion above would appear to indicate the degrees_north is a valid coordinate for longitude (and this is accepted by the cf-checker), but I don't think it should be. The problem here is that additional information is bundled into the units string and the assumption that UDUNITS will do something sensible is not valid (because UDUNITS has no way of dealing sensibly with this situation).

There appears to be a consensus that conforming to UDUNITS for temperature (i.e. the rule that a quantity expressed in units of °C is a temperature on the Celcius scale, but any power of °C is directly equivalent to Kelvin: "1°C = 274.15K" and "1°C2 = 1K2") is not a good idea. UDUNITS does not know anything about standard names or qualifiers, so following UDUNITS, as implied, I believe, by the current convention, would require us to adopt this interpretation consistently and disallow "°C" for temperature differences. @JonathanGregory , @larsbarring : do you agree with the last statement?

JonathanGregory commented 2 years ago

Dear @martinjuckes

Yes, I agree that degree_north should be disallowed by the checker as a unit of longitude. That is a specific rule we ought to include. As I'm sure you know, the special units for longitude and latitude were inherited from COARDS. The north/east distinction is redundant with standard names, but standard names are optional. This is not an ideal convention, but it is how it is, I would say.

I don't think we should disallow degC for temperature differences because it is commonly used as a unit for temperature differences in practice, such as in the example of equilibrium climate sensitivity that I mentioned. Historical and future global warming are also often stated in degC, not K, for example in IPCC reports. I don't think data-writers would respect our prohibition of it.

Rather, I think we have to add some special flag for quantities which have units of temperature. To decide what we should do, I feel that a critical question is whether temperature is the only quantity which has this kind of problem (apart from time, which we handle in another way already).

Best wishes

Jonathan

martinjuckes commented 2 years ago

Dear @JonathanGregory ,

I'm glad you are not proposing to disallow degC, though I'm not sure why you the feel the need to comment on this idea. Has anyone proposed it?

Backward compatibility with the 1995 COARDS conventions has been important, but, given the current interpretation of the backward compatibility policy, are there any use cases of requiring that new versions of the CF Conventions maintain equivalence with all aspects of the COARDS convention? To me, this level of constraint appears disproportionate.

Identifying which variables have "this kind of problem" is certainly a good idea. My view, and I'm not sure whether you disagree or whether I haven't communicated it clearly, is this this kind of problem relates to use of units strings which are not units of measure in the conventional sense. That is, units strings such as degC, degrees_east, days since ... which convey information about the quantity being measures as well as the measurement unit.

larsbarring commented 2 years ago

@martinjuckes: I might not have read the SI Brochure you referred to careful enough, but

"1°C = 1K" is the rule in SI and ISO standards illustrates

in the Table on p.138 carries a footnote that states

The degree Celsius is used to express Celsius temperatures. The numerical value of a temperature
difference or temperature interval is the same when expressed in either degrees Celsius or in kelvin.

which basically is what I wrote above; Δ°C = ΔK. On the preceding pages (that I only had a quick look at) of the SI Brochure, the text elaborates the relationship between the units and their different origins. While I intuitively like your wording ""unit" is being used to define an amount" I am not sure it helps us here, because as Jonathan writes both units K and degC carries implicit information about their individual origins. The key thing is that one scale is absolute, i.e. does not allow negative numbers for physical reasons. So, after all, maybe I was onto something when writing "absolute" and "relative":

absolute scale (Ua): a scale that has zero as its origin and where negative values are impossible for physical or otherwise principal reasons.
relative scale (Ur): a scale that has an origin k (k≠ 0), which allows both positive and negative numbers. It is related to the absolute scale through Ua = Ur × m + k, where m is a multiplicative scale factor. Values below -k/m are not physically meaningful.
differences : differences expressed in the respective scales, then ΔUa = ΔUr × m.

This should cover any reasonable temperature unit conversion. For the purpose of CF I think that at least K, °C and °F have real use cases, the other ones are maybe more of historical interest (I came across °Réaumur in an earlier project on long observational timeseries).

JonathanGregory commented 2 years ago

Dear @martinjuckes

I thought you were proposing to ban degC for temperature differences! I misunderstood you, sorry. Actually you were asking whether we agreed that to ban it would be inevitable if we chose to follow udunits to its logical conclusion - is that right?

I do agree with you that units should not contain the definition of the quantity, in an ideal convention. CF is mostly ideal in this respect, but we have this exception for degrees, and some for dimensionless vertical coordinates, described in section 3.1. These were all originally included for COARDS compatibility. I don't know whether COARDS is obsolete now, but since we built them into CF at the start it would be hard to remove them now, certainly the one about degrees.

I don't think that @larsbarring's issue is exactly that one, though. It's about having to use a different units conversion for differences and "absolute" values (meaning different wrt to an implicit reference value).

Best wishes

Jonathan

Armin-RS commented 2 years ago

As a physicist following this discussion, I would like to mention that there are even negative Kelvin temperatures in systems with an unintuitive distribution of energy, see https://en.wikipedia.org/wiki/Negative_temperature. The most common of these systems probably are lasers which are in wide use in the atmospheric sciences (one could think of instrument housekeeping data). So even Kelvin temperatures can be below 0.0.

larsbarring commented 2 years ago

@Armin-RS, thanks for this interesting perspective, it was all news to me. In an earlier comment I was almost writing something like "Kelvin scale is always positive until future physics tells us otherwise" but now see that the future arrived already back in 1949. Anyway, and irrespective of whether we have any real use case for negative kelvin values or not, do you think that this has any material impact on the discussion above (except my characterisation of the absolute scale)?

Armin-RS commented 2 years ago

Dear @larsbarring, no, except for that you should not rely on the condition that a temperature < 0.0 necessarily has to be in degree Celsius/Fahrenheit/Reaumur.

martinjuckes commented 2 years ago

Dear @JonathanGregory :

On degC : yes, I was pointing out the consequences of adopting UDUNITS as a pseudo-standard (i.e. referring to it as if it were a standard even though it is clear to anyone who looks beneath the surface that it is not a standard in any meaningful sense).

@larsbarring : I agree that the footnote you quote above is consistent with your approach, but the table on page 138 also states without ambiguity that °C = K. It is a simple factual observation. I won't bother repeating it again if it is unwelcome.

larsbarring commented 2 years ago

@martinjuckes: thanks for drawing attention to this wider context. I do agree that the frequent references to the since quite some time non-existing file udunits.dat need to be replaced with references to an existing standard, like SI. But this is a more general question that should be dealt with in a separate issue, which I will be happy to contribute to.

But, as @JonathanGregory writes the question here is how the CF Conventions handle the distinction between 'absolute' temperature and temperature differences. If we follow the SI definition, °C = K that Martin points out is not confined to temperature differences only, what does it mean for CF? This is not clear to me.

What I however want to emphasize is that CF needs to make a clear distinction between 'absolute' temperatures and temperature differences. And this needs to be done irrespective of whether UDUNITS remains as a "pseudo-standard" or the SI system is introduced, or something else. To me, the immediate question is whether progress on the current issue materially depends on this choice.

JonathanGregory commented 2 years ago

The definition of the SI system by the International Bureau of Weights and Measures (cited previously by Martin) recognises temperature and temperature difference as two different uses of the units of degC and K, and gives the different rules for conversion in the two cases. I suggest that we add a new field to the standard name table, defined for every case where the canonical unit contains K and not in other cases, to indicate whether K refers to temperature or temperature difference in each case. In the conventions we should also add a statement that udunits should not be used to convert units containing K of temperature difference.

taylor13 commented 2 years ago

I haven't read this thread carefully, but it appears that we are ruling out use of the standard name "air_temperature" when it represents some difference (say between two pressure levels or two surface stations). I note that we have a standard name "sea_water_temperature_difference" but not "air_temperature_difference". It seems that the property being measured is the temperature and the fact that it is a difference shouldn't change necessarily change that, should it? Anyway, is it clear which variables when representing a "difference" require new standard names? Do we need to double the size of the standard name table?

larsbarring commented 2 years ago

@taylor13, I am not sure what you mean when you write

... standard name "air_temperature" when it represents some difference (say between two pressure levels or two surface stations). <....> It seems that the property being measured is the temperature and the fact that it is a difference shouldn't change necessarily change that, should it?

When we have temperature measured at two surface stations, or two pressure levels, that is what is measured. The difference as such is not measured but calculated. And this calculation have implications for how to interpret the units. As @martinjuckes and @JonathanGregory points out, this is made explicit in the SI system.

For example use the temperature in London and Reykjavik, and starting with degree Celsius:

Reykjavik	London	Δ
6	17	-11 or +11

If the sign is not important (sometimes it is!) it could be called the "absolute_difference", which is the equivalent to the cell method range. Now, if we naively use UDUNITS to translate these numbers to Kelvin we get after rounding

Reykjavik	London	Δ
279	290	262 or 284

where the values for Δ do not make sense as differences. That is, the temperature difference is not the same thing as an 'ordinary' temperature. This becomes even more clear if one calculates the difference in Kelvin the proper way:

Reykjavik	London	Δ
279	290	-11 or +11

Here the variant "-11K" is perfectly reasonable, but still something completely different from the negative absolute temperatures that @Armin-RS was referring to.

And the same problem occurs if we use UDUNITS to naively translate to degree Fahrenheit:

Reykjavik	London	Δ
42.8	62.6	12.2 or 51.8

which properly should be

Reykjavik	London	Δ
42.8	62.6	-19.8 or +19.8

~~EDIT: This example highlights the need for a standard name air_temperature_difference, but this a conversation for another issue.~~ EDIT(2): From the comments below it is clear to me that this was not a good idea.

larsbarring commented 2 years ago

Dear Jonathan @JonathanGregory,

I agree with your suggestion to

add a new field to the standard name table, defined for every case where the canonical unit contains K and not in other cases, to indicate whether K refers to temperature or temperature difference in each case.

But I have come to realise that this is not enough. As you have pointed out at several occasions the standard name do no always give enough detail for interpreting the content of the data variable. Additional necessary information may be found in the cell methods attribute and/or standard name modifiers. So, I think something more is needed, probably in relation to Appendix E, Appendix C and in Section 3.1 Units

JonathanGregory commented 2 years ago

Dear Karl @taylor13 and @larsbarring

We wouldn't be doubling the size of the standard name table, because this issue refers only to temperature. The great majority of quantities with K in their units mean it as a temperature difference (with a positive or negative power, often K-1 i.e. per degree). I think it is just the quantities in pure K which are ambiguous. I agree that we would need to define new standard names of temperature difference for every existing temperature quantity (in pure K) that appears in the table. There are about 50 of them (excluding ones which say they are difference, change or anomaly), so perhaps this is not the best solution.

An alternative, which might address Lars's point, would be to add something to the units string to distinguish units of temperature from units of temperature difference. We could, for instance, introduce a new unit of delta_K which could be used as an alternative unit for those quantities whose canonical unit is pure K. Thus the distinction would be made in practice using the units: temperatures are in K, temperature differences in delta_K. Since delta_K is not a udunit, it couldn't get incorrectly converted to degC. Experimenting with udunits, I see that it only suggests adding or subtracting 273.15 for pure K. It knows, for instance, that 1 K s-1 = 1 degC s-1.

Best wishes

Jonathan

sebvi commented 2 years ago

Dear everyone,

it is a very interesting discussion I was quietly following in the background.

I think the discussion goes beyond Celsius vs Kelvin or the units of variable difference or how to relate a difference in Celsius to a difference in Kelvin. My feeling is that this particular case needs to be seen in a more general context. There is lots of confusion and misunderstanding in the discussion that mostly comes from simplifications from the general case.

To define a unit, one need a reference, a scale and a direction that represents "increasing/decreasing" (not necessarily positive/negative if the reference is not set to be 0). At the end, it is just like ticks an “axis”.

The definition of the reference is commonly omitted or forgotten in the literature because the units are mostly expressed as “Δ" so that the reference vanishes: (T1-Tref) – (T2-Tref) = T1-T2. Then you choose T1 and T2 to be exactly 1 unit of scale: it is the 1°C/1K/1m/1J/etc. used everywhere in unit conversion formulas and definitions.

For °C, the reference is the freezing point temperature of pure water and when I say 25°C, I, in fact, mean 25°C above 0°C or the freezing point temperature of pure water. It is the same mechanism with any units: assume I have a pencil and I want to measure its length using a ruler: I could choose to put the 0cm (reference) of the ruler at one end of the pencil and read 12cm at the other end. But, I could just as well put the 3cm mark at one end and read 15cm at the other end. In both cases the pencil is 12cm and in both case it is a difference with respect to a reference. When the reference is 0 (99.99% of the time), The reference is simply dropped.

When doing units conversion you need to do 2 things: 1) apply an offset of reference from one unit to the other unit and 2) rescale. The general formula is: X(i) = a(i/j) ( X(j) + b(j) ) Where i and j are the units of X you convert to/from, a(i/j) is the scaling factor and b(j) is the offset between the references expressed in the same unit than you are converting from. Note that you could define it as X(i) = a(i/j) * X(j) + b(i) and in that case b(i) is in the unit you are converting to.

For most units, the reference is the identical, i.e. the reference in one unit does not need an offset to convert to the other unit and you are simply left with a rescaling. For instance: E(J) = 6.242e+18 J/eV *E(eV) Where 6.242e+18 J/eV is the scale factor

In the case of Celsius <-> Kelvin, there is no rescaling needed (well… the scaling is 1) but the reference is not the same: T(°C) = 1°C/K (T(K) + 273.15K) = 1 °C/K T(K) + 273.15°C

In the scientific literature and in textbooks, this is commonly “simplified” by dropping the units of the scale factor (or in the specific case °C<->K, dropping the whole scale factor) and dropping the units of the reference offset.

An interesting case is Celsius <-> Fahrenheit because it is one of these cases where the reference changes AND you need to rescale: T(°C) = 5/9 °C/°F (T(°F) – 32°F) Or the other way around: T(°F) = 9/5 °F/°C (T(°C) + 160/9 °C)

Now if you want to look at differences of Temperature in Celsius translated to differences in Kelvin: T1(°C) – T2(°C) = 1 °C/K (T1(K) - 273.15K) – (1 °C/K (T2(K) - 273.15K)) T1(°C) – T2(°C) = 1 °C/K * (T1(K) – T2(K))

The offset between the references vanishes and it is where most of the confusion in this discussion comes from! But another problem arises, briefly mentioned by Lars: T1-T2 = -(T2 – T1), i.e. a difference is an anti-commutative operation so it requires a convention…

But, writing: °C = K is wrong 1°C = 1K is wrong ΔT(°C) = ΔT(K) is better but not rigorously precise

The correct way to write it would be: ΔT(°C) = 1 °C/K * ΔT(K)

It is similar with altitude, elevation and height: they share the same scale but have different references.

All that to say that the units of temperature and temperature difference are identical in terms of scale but a temperature difference does not require a reference (what Lars refers to as absolute vs relative). We can make an analogy with time vs period, they can be expressed in hours, one wrt a reference (time), the other one being a difference between 2 times (period). The ambiguity is lifted by the use of “time” and “period” in the standard names and an explicit reference in the case of a time. A similar approach could be used.

If CF wanted to adopt a generic approach to do the distinction between temperature and temperature difference, I would say Temperature should have units “°C wrt pure water freezing point temperature” and temperature difference simply “°C”. But “°C” implies that the reference is the pure water freezing point temperature so it is redundant! With this in mind the only option left is using meaningful standard names. Lars’s use cases are not any temperature differences but very specific, well defined ones so I would suggest to create new standard names for these.

Regarding the problem with UDUNITS conversion, I don’t think it is a CF problem but more an UDUNITS problem (or a user problem). The solution is probably to ask UDUNITS to create new pseudo units, maybe “°C relative”, to distinguish relative vs absolute temperatures. I must say I am totally against it, I already think that degree north, degree east and hours since are pure heresy from a scientific point of view.

taylor13 commented 2 years ago

I had earlier today come up with the same idea as @JonathanGregory that we might define a unit for temperature that included "change" as an essential qualifier. I decided to sit on it for a couple of days before suggesting that unsettling "solution" to our problem to see if I could come up with a good argument for not proposing it. Now that it has been suggested, I do think it merits some consideration. The only problem with the standard name air_temperature seems to be that it doesn't differentiate between a difference in air temperatures and an absolute temperature, which is important when doing units transformation. Otherwise temperature is just like air wind speed, humidity, or pressure, and no one is proposing that we need to distinguish in these cases between differences and absolute values.

Perhaps the best argument in favor of a unit that is not widely recognized is that software designed to convert from one unit to another will be stumped and won't produce a wrong answer.

larsbarring commented 2 years ago

Dear all,

@sebvi, Thank you for the excellent clarification of the general principles of unit conversion.

I think that we now are beginning to converge towards a common understanding of what the problem is.

@JonathanGregory, @taylor13, I, too, was considering to introduce delta-units. There have to be one such for each temperature unit (°C, K, °F, °Re, °R, ....), but this is not a problem as such. And this is the route taken by another unit conversion package, pint, that I have heard of. But I felt a bit reluctant towards this idea; something that was reinforced by @martinjuckes' comment:

...this kind of problem relates to use of units strings which are not units of measure in the conventional sense. That is, units strings such as degC, degrees_east, days since ... which convey information about the quantity being measures as well as the measurement unit.

And @sebvi's comment, and in particular his conclusion:

... to ask UDUNITS to create new pseudo units, maybe “°C relative”, to distinguish relative vs absolute temperatures. I am totally against it, I already think that degree north, degree east and hours since are pure heresy from a scientific point of view.

reinforces this even further. It not clear to me that merely introducing delta-variants of the temperature units will solve the problem. I think that it would be more like a "quick hack" that likely will bite us back in the long run.

@JonathanGregory comments that UDUNITS handles K s-1 = 1 degC s-1, which is the canonical unit of e.g. tendency_of_air_temperature_due_to_advection (i.e. it is a difference). This also works for degF. Thus, it seems that UDUNITS somehow has at least some 'knowledge' of when to apply the additive constant and not. Taking this one step further and looking at powers, which @martinjuckes comments UDUNITS does this: 1degC² = 1K², and 1K² = 3.84F². Again, UDUNITS makes assumptions about the intent behind these units.

In an attempt at disentangling all this, let's go back to the general unit transformation formula from @sebvi

X(i) = a(i/j) ( X(j) + b(j) ) Where i and j are the units of X you convert to/from, a(i/j) is the scaling factor and b(j) is the offset between the references expressed in the same unit than you are converting from.

we note that UDUNITS has the required constants a(i/j) and b(j) (and thus b(i)) for a large number of unit conversions. To me this is an excellent starting point!

What however is missing are two things:

The CF Conventions do not have a clear and simple way to distinguish whether b(j) vanishes or not. Neither do they have a mechanism for distinguishing whether the a(i/j) vanishes or not. Instead the different situations are encoded in a mixture of standard names, cell methods and standard name modifiers.
Neither does UDUNITS have a mechanism to distinguish between these situations. It seems however that there are some assumptions regarding the quantities the units are 'intended' to represent, but these assumptions are not obvious (to me at least).

I will return to these two in follow up posts.

martinjuckes commented 2 years ago

@sebvi writing "°C = K" is not wrong, it is a convention which you may disagree with, but not wrong. Please check the references to ISO and SI given above.

sebvi commented 2 years ago

With all due respect, @martinjuckes , I will have to disagree,"°C = K" is wrong from a pure mathematical point of view, it is simply a fact. You can elude it by side-stepping the problem and call it "a convention", it is still wrong as demonstrated rigorously above. I had checked the SI brochure and it is a shame that an authoritative body is not rigorous and does not lead by example. Their table of derived units is at best imprecise and for unit conversion involving an offset, it is wrong. I think each time a unit conversion involves an offset, they should define the conversion for both the absolute and relative quantity. Maybe the primary audience of the SI Brochure is "Joe Public" and "Joe Public" will certainly not see or understand the difference. It is a shame because it is the reason why we produce cohorts of students or scientists not understanding properly what are the mechanisms behind unit conversion. They simply use formulas blindly.

@larsbarring : to give a different perspective, I should mention that the "a and b approach" is the chosen mechanism we are going to use in the next iteration of GRIB (GRIB3) that we (WMO ET-data team) are developing. You are probably aware that a long-standing problem of GRIB2 is that it defines canonical units for each parameter and do not let you chose the one you prefer, typically Celsius vs Kelvin, fraction vs %, etc. . GRIB3 will let you define how your data values can be converted back to the canonical units (or you can simply use the values in your preferred units if you don't want to convert back). In the case of an absolute temperature in Celsius, a=1 and b=-273.15 (conversion back to canonical units, not from, so Celsius to Kelvin) and in the case of a relative temperature in Celsius, a=1 and b=0.

That said, I don't think it is a good idea here, it is not in the spirit of CF to prescribe units. :)

martinjuckes commented 2 years ago

@sebvi - I don't think that you can appeal to mathematics to help you here. I'm not trying to elude anything. The SI standard has is supported by a far larger community that CF.You say that "to define a unit you need a reference scale", but that is not how modern units work. SI is based on derivation of units from fundamental quantities through reproducible experiments. The Kelvin is defined via the Boltzman Constant, which relates temperature to mass, distance and time. Your approach is self-consistent (which is good, from a mathematical perspective), but so is the SI approach. The advantage of the SI approach is the huge user base and scientific credibility. The SI definitions come from physics, not mathematics, though the mathematics is clearly sound.

SI defines "units" in section 2.1 of their brochure:

The value of a quantity is generally expressed as the product of a number and a unit. The
unit is simply a particular example of the quantity concerned which is used as a reference,
and the number is the ratio of the value of the quantity to the unit.

The reason that they don't list offsets in their table of units is that there are no offsets in their concept of units. There may be offsets in different quantities, but that is independent of the SI concept of a unit of measure.

larsbarring commented 2 years ago

Dear Martin @martinjuckes and Sébastien @sebvi,

Let us not get diverted into a deep discussion whether SI is 'right' or not, and whether the mathematical approach is more 'fundamental' or not than the SI physical approach. And before you jump at me for writing this: yes I do agree that getting the fundamental aspects right is indeed important, but only to a certain level.

Given the current situation with regard to how CF together with UDUNITS now handles units and unit conversions I suggest that a certain level of pragmatism will take us a long way towards improving the situation. But this pragmatism should not lead in a direction that we may regret later on. With this in mind I suggest that we focus our energy on how to improve CF:

@sebvi: I tend to agree with Martin that CF should look towards SI when it comes to units, because of it wide acceptance if nothing else.

@martinjuckes: as you saw from my previous comment I was surprised by the SI statement that "°C = K". And your comment above does not help me to understand how CF might describe unit conversion between degC and K in a way that is consistent with SI. Would you mind explaining this in simple words, because in the case of 'non-difference' (i.e. 'absolute' or 'relative') units I have difficulty seeing how this can be done without involving the origin of the Celsius scale (273.15 K).

sebvi commented 2 years ago

@martinjuckes I think you misunderstood me completely. I am not defining units in a different way than SI does, it is exactly equivalent. Your quote from SI is a different way to express what I wrote earlier:

The definition of the reference is commonly omitted or forgotten in the literature because the units are mostly expressed as “Δ" so that the reference vanishes: (T1-Tref) – (T2-Tref) = T1-T2. Then you choose T1 and T2 to be exactly 1 unit of scale: it is the 1°C/1K/1m/1J/etc. used everywhere in unit conversion formulas and definitions.

There is no mathematical vs physical definitions of units. The mathematics are the language used to express the physics on paper. I agree that the BIPM does not care about unit reference and that their core work is to precisely define the unit scale of the units not the reference. Because of the nature of the core SI units, they are all defined with a reference set to be 0. Some of the derived SI units may or may not share the reference with the core SI units, when it is not the case an offset needs to be brought in to properly do the conversion of absolute quantity. For conversion of relative quantity, the offset is not necessary anymore because it vanishes.

If I could do an analogy with words, I would say that I and the BIPM give the exact same meaning of a particular SI units but I am saying the BIPM is misspelling the word while I give to the correct spelling. Nothing else. Usually I would not correct "°C = K" but here it is an important part of the discussion that can not be omitted because it is the reason why the units conversion is not properly handled by some piece of software.

To summarise: X(i) = a(i/j) ( X(j) + b(j) ) (conversion of absolute X) ΔX(i) = a(i/j) * ΔX(j) (conversion of difference of X)

For most units conversion, b(j) = 0 which makes the two formulas above "almost" identical and often confuse people, even among the BIPM. In CF, we need a practical way to deal with the subtle difference so that software can do proper conversions.

martinjuckes commented 2 years ago

Dear Lars, @larsbarring : apologies for not picking up the question that you have repeated earlier. The way that this is handled in SI and the ISO 80000 standard is that Celsius temperature (t) and Thermodynamic temperature (T) are different quantities, related by the simple equation:

t  = T - T_0

where T_0 ~= 273.15K +/- 3.17x10-7 K is an experimentally determined constant. For the purposes of CF we probably don't need to worry about the uncertainty there, but it is worth noting that T_0 is now an experimentally determined constant.

In the SI/ISO view, the unit conversion between degC and K is trivial, because they are the same. The conversion between the Celsius scale and the Thermodynamic temperature is seen as a quantity conversion, not a unit conversion.

In order to implement this, we would need to change standard names to reflect the difference between thermodynamic temperature and Celsius temperature, except for names which explicitly deal with differences: in the latter case no change would be needed because changes in thermodynamic temperature and changes in Celsius temperature are fully equivalent.

Given x=5°C and y=300K, we are accustomed to inferring that x is measured on the Celsius scale and y is a thermodynamic temperature, but the SI/ISO units to not carry that information -- you need to specifically say that x is a Celsius temperature, if that is the intention.

martinjuckes commented 2 years ago

@sebvi : your logic defies me. In SI the offset is not part of the unit, in your definition it is. SI do care about the offset and they quote the uncertainty (see above). See page 133 of the brochure for more details.

sebvi commented 2 years ago

@martinjuckes : You may have missed that I make a distinction between a reference (defining the origin of the scale) and an offset (sometimes necessary to do the conversion between 2 units) corresponding to the difference between the reference of each unit.

Now if you want to call T_0 "a constant" instead of "an offset" and Kelvin -> Celsius a "quantity conversion" instead of a "unit conversion" to avoid losing an argument, let's not waste each other time: you won.

Let's now focus on finding a way forward.

JonathanGregory commented 2 years ago

Dear @larsbarring

@JonathanGregory, @taylor13, I, too, was considering to introduce delta-units. There have to be one such for each temperature unit (°C, K, °F, °Re, °R, ....), but this is not a problem as such. And this is the route taken by another unit conversion package, pint, that I have heard of. But I felt a bit reluctant towards this idea ...

I feel we should discuss this some more, as a pragmatic solution. As Karl @taylor13 also said, we do not generally regard a difference between two values as being a different quantity from either of the two values individually. There are some standard names, not just for temperature, which are defined as differences, but it's not usually necessary. I don't think it would be a good idea to require this distinction in standard names for temperature, because it isn't one which people normally make, and it would inconsistent to make a distinction in standard names solely because of a units problem. That would be the tail wagging the dog.

The practical problem that we wish to avoid is the inappropriate conversion of units for a temperature difference. I suggested before that we could insert a flag into the standard name table to indicate the quantities for which the temperature offsets might be needed for conversion. I now believe that this is only the case for standard names with canonical units of pure K, so that serves as the distinction we need.

The only situation where UDUNITS does the wrong thing is when a quantity with canonical units of K is actually a difference. If K appears to any power other than unity or is multiplied by any other unit, apparently UDUNITS does not apply the offset when converting to degC. This is what we want it to do. It would be helpful if someone who knows more about UDUNITS could comment on whether that's right.

If so, the data-reader can just check the units. If the units are pure temperature, care is needed in the conversion, and UDUNITS might do it wrongly. That doesn't provide a means for the data-writer to indicate what should be done to convert the units. Defining a unit of delta_K, which the data-writer can use for a quantity which is intended to be a temperature difference, is a safe and simple solution, because UDUNITS does not recognise it, so it will fail safe. We could ask UDUNITS to add this unit, and perhaps also to define delta_degC as equivalent to delta_K, and not convertible to K, but we don't have to do that. UDUNITS isn't essential to CF; we use it to define the syntax of acceptable units strings, but we have some other exceptions to it.

I appreciate that people might not like this solution because it puts a distinction of meaning about the quantity in the units string. However, we have to make this distinction only because of a problem about units. Since the units causes the problem, it seems fair that the units should be where we solve it.

Best wishes

Jonathan

martinjuckes commented 2 years ago

@JonathanGregory : I agree that what you propose will work and resolve the ambiguity that @larsbarring is trying to address. You are also correct in anticipating that I, and perhaps others, don't like putting distinction about the meaning of a quantity in the units string. For the avoidance of doubt, I will repeat again that the aspect of this that I don't like is the way in which it separates us from the approach adopted in SI and ISO. I don't agree that units is causing the problem here .. as noted many times above. However, I'll drop my objections if your proposal is preferred by others.

Personally, I feel that following SI and ISO 80000, by distinguishing between thermodynamic temperature and Celcius temperature would give a cleaner and far preferable solution. This is also a departure from UDUNITS, but I don't believe there is a realistic option for avoiding departure from UDUNITS. Using SI units brings the advantage of interoperability with other communities.

In response to @sebvi I can only say no, I didn't miss that, no its not about avoiding loosing the argument, and yes it is about trying to find a way forward.

JonathanGregory commented 2 years ago

Dear @martinjuckes

Thanks for explaining your proposal again. I missed the essence in the large number of exchanges above. Yes, I agree that this could work too. The existing standard names for temperatures have units of K, so we would reinterpret/redefine all of these as thermodynamic temperature. CF would overrule udunits in making K and degC not interconvertible. For new data, it would be an error to write air_temperature in units of degC; we cannot do anything about any such existing data. We would need to introduce new standard names, upon request, for air_celsius_temperature, air_fahrenheit_temperature, sea_water_celsius_temperature, etc., with canonical unit of degC. I wonder how many of such standard names would actually be requested. In any case, it would not be too bad, even if all ~50 standard names in K were also needed in both degC and degF. An analyst might convert degC to K with UDUNITS, even though CF disapproves, and they should then change the standard name, so that the data remains correct according to CF.

Best wishes

Jonathan

sethmcg commented 2 years ago

As a data provider, I have to voice stringent opposition to the idea of creating different standard names for different temperature units. It would be a massive loss of backwards-compatibility for existing data, and nobody is ever going to go back and do the conversions to make everything conformant. It's FAR too much work. The end result would be that we'd have a handful of datasets that were strictly compliant, while the majority of datasets would continue to use air_temperature with an indiscriminate mix of K, degC, or degF for units, which would only exacerbate the situation.

The solution that seems least disruptive to me is something along the lines of what @larsbarring suggested early on, which is to make note (in a machine-readable way) of cases in the standard name table where unit conversion software may need to care about whether or not the data represents an absolute value or a difference. CF could then prescribe a standard way of indicating a difference that unit-conversion software could check for, based on whatever we can figure out is the best compromise between ease of implementation and applicability to existing data.

Another option would be to ensure that every X_temperature standard name has a corresponding X_temperature_difference standard name (and vice-versa), although as @larsbarring and @taylor13 have noted, there are some usability issues associated with that approach.

martinjuckes commented 2 years ago

Hello @JonathanGregory , @sebvi : I also object to having different standard names for different temperature units. I'm unsure where Jonathan got that idea from, though it evidently flows from something that I wrote. In SI and ISO there are different quantities for thermodynamic temperature and for Celsius temperature. The units °C and K are interchangeable, with 1°C = 1K by definition. The change that would be needed in CF would be to introduce new standard names for Celsius temperature, where needed, which is defined to refer to the temperature on the Celsius scale -- the term "Celsius temperature" is not a statement about the units , it defines a quantity which can be measured in any SI units of temperature.

Conversion from K to °C would still be valid, but UDUNITS habit of inserting offsets in some cases and not in others, which is already divergent from CF, would make use of UDUNITS risky, as it is now.

Deprecating existing usages of CF standard names is a regular part of the process of revising, extending and clarifying CF -- it does have to be carefully justified, but there is no fundamental "backward compatibility" issue with such changes. I see considerable benefit in doing away with an anachronistic divergence from the SI standard.

JonathanGregory commented 2 years ago

Dear @martinjuckes

You write

The change that would be needed in CF would be to introduce new standard names for Celsius temperature, where needed, which is defined to refer to the temperature on the Celsius scale

and I proposed

We would need to introduce new standard names, upon request, for air_celsius_temperature ... etc.

I thought that indeed I was proposing something which followed your earlier remark. Apparently I have misunderstood you. Sorry about that, but actually I can't see what's different between those statements. You also say

the term "Celsius temperature" is not a statement about the units; it defines a quantity which can be measured in any SI units of temperature.

I suppose the distinction is that thermodynamic temperature is wrt absolute zero, and Celsius temperature is wrt the triple point of water - is that right? I agree that this isn't strictly a statement about units, but I think it would be very confusing to write 0 K to mean the freezing point of water, explained as being a Celsius temperature which happens to be expressed in the units normally used for thermodynamic temperature.

Dear @sethmcg

I also don't really like my proposal of having different standard names for Kelvin and Celsius temperature, although I think it's not a problem for backward compatibility, and it could be checked by the CF checker.

I believe that no-one who has commented likes the idea of having pairs of temperature and temperature-difference standard names.

Lars @larsbarring and I have both suggested adding a note in the standard name table about which temperature quantities require special handling of their units. However, Lars thinks this would not be sufficient, and something has to appear in the CF metadata itself to avoid the problem. I'm inclined to agree.

I prefer the alternative of introducing a temperature difference unit such as delta_K which cannot be converted to K or degC. No change to standard names would be needed, but we would indicate in the table which names for temperature, all of which currently appear as K, could allow delta_K as an alternative, while some would change from K to delta_K, such as air_temperature_anomaly.

Do you, or does anyone else, know whether it's correct that UDUNITS applies the offset when converting between K and degC only when the unit is equivalent to pure K, and not when it's raised to a power or multiplied by another unit?

Best wishes

Jonathan

martinjuckes commented 2 years ago

@JonathanGregory : sorry for the lack of clarity. You propose "CF would overrule udunits in making K and degC not interconvertible. For new data, it would be an error to write air_temperature in units of degC", I propose making K and degC fully equivalent to each other, as is the case with ISO and SI. air_temperature = 300K and air_temperature = 300 degC would both have exactly the same meaning.

Concerning UDUNITS and powers of K, this is the behaviour of the library on my laptop:

mjuckes@martin-latitude7480:~$ udunits2 
You have: 1 degC
You want: K
    1 degC = 274.15 K
    x/K = (x/degC) + 273.15
You have: 1 degC2 
You want: K2
    1 degC2 = 1 K2
    x/K2 = (x/degC2)
You have: 1 kg.degC
You want: kg.K
    1 kg.degC = 1 kg.K
    x/kg.K = (x/kg.degC)

In each case, I have entered a quantity with degC in the units and asked the library to generate the equivalent in K. I don't know if this behaviour is documented anywhere.

larsbarring commented 2 years ago

Dear all,

There are a couple of thing that I would like to draw attention to before we continue the conversation on standard names and the possible need to add new ones, and what a possible more stringent adoption of SI units may bring about.

The fundamental idea behind the CF Conventions is to provide a set of conventions and guidelines that data producers and data consumers can use to exchange data. The Conventions provides a rich machinery to meet these needs, but only a rather limited subset of these is required to make a file CF compliant. And standard names is not one of the required elements, some thing that has been mentioned many times in responses to users requesting new standard names that for some reason are judged problematic to implement. The current focus on standard names does not solve the root cause of the problem. Instead it leaves a fairly large part of the user community still in the dark regarding the problem we now discussing, and in the long run this is only a disservice to CF.
@JonathanGregory wrote "we do not generally regard a difference between two values as being a different quantity from either of the two values individually". This is a generally good practical and pragmatic rule for CF. As @sebvi wrote "When the reference is 0 (99.99% of the time), the reference is simply dropped." The root cause of the problem is that the CF Conventions do no make it clear when this rule is not valid.
The problems with Table C.1 and E.1, which I repeatedly have been pointing at, is a result of this lack of clearly identifying when this general rule is not valid.

As @JonathanGregory notes above, we have both suggested to add a new field to the standard name table identifying those needing special treatment. Initially I thought this would be a logical (yes/no) field, but now I believe we have an example of fuzzy logic (yes/maybe/no), where the maybes are standard names where the treatment depends on whether certain modifiers or cell methods are present or not. This is one of the reasons why I keep coming back to Table C.1 and Table E.1.

The question then arises what to do with this new field (in a wider sense than just publishing it in the document). As @sethmcg already suggested this should be in machine-readable way (and not only as an xml file). Therefore I have the following concrete suggestions:

Update Table C.1 and Table E.1. This will pave the way for some text under Chapter 3.1 explaining unit conversions, including the 'mathematics' involved. This text should explain both what happens when taking the difference and raising to powers.
Introduce a new attribute that signals what 'special treatment' is needed in terms of unit conversion. This attribute should be explained in a way that is independent of any specific standard name, but would I imagine refer to Table C.1 and E.1 (an any other CF elements that we may identify as relevant). The aim of this is that data producers should be able to determine whether the attribute is necessary and in that case what it should contain. In this way it should be possible to have a general solution to the problem wee are discussing, rather than limiting it to the standard names only.
The field in the standard name table will then be the suggested (or required) value for this attribute, at least in the yes/no case.

martinjuckes commented 2 years ago

@larsbarring : what is the basis of your statement that "The current focus on standard names does not solve the root cause of the problem", which appears be your justification for ignoring alternative suggestions.

larsbarring commented 2 years ago

That we leave all those not having a standard name for their data behind and basically in the same state of confusion as is the case now.

martinjuckes commented 2 years ago

Why would anyone not have a standard name for their data? I don't see how this relates to the previous discussion. The SI Celsius temperature (which is a quantity defined to be a temperature relative to to 273.15K, not a temperature expressed in degC) is, as far as I can tell, the quantity that you refer to as relative temperature and the SI thermodynamic temperature is what you referred to as absolute temperature. These SI concepts appear perfectly matched to resolving the issue that you have raised.

larsbarring commented 2 years ago

@martinjuckes, writes

Why would anyone not have a standard name for their data?

-- maybe there is no standard name yet, -- maybe that a standard name was proposed but rejected for some reason. This has happened for other reasons than "can you instead use 'this_standard_name' that is already available", e.g. that it is too specific or not well enough defined., or ...

In the CF2021 breakout discussion session on standard names the question came up how to make a file CF compliant for a somewhat specific data variable, for which there was no relevant standard name already in the table came up. And @japamment answered that a file can be CF compliant even without a standard name. And I have seen similar answers a few other times, so this is not unusual. Maybe Alison could comment on this.

davidhassell commented 2 years ago

Hello, I'm following this as closely as I can ... I think that it is certainly reasonable to not have a standard name. Not only is this allowed, as Lars points out, and therefore communities are not obliged to request them; but for some esoteric quantities it may be inappropriate to even request one, such as (perhaps) the temperature increment produced by a particular physical parameterisation of a particular climate model. Thanks, David

martinjuckes commented 2 years ago

@larsbarring Yes, as you say, there are quantities without standard names .. but what has that got to do with this discussion of clarifying the temperature unit?

The temperature unit can and should be clarified by using the existing, well defined, universally recognised, rigorously specified SI and ISO definitions of the temperature unit. This does address your issues around tables C.1 and E.1 for temperature variables, and would provide a clear way of resolving the defect in the current convention regarding the divergence between UDUNITS treatment of temperature.

JonathanGregory commented 2 years ago

Dear @larsbarring et al.

I think you're right that we shouldn't make this solution depend on standard names, because they're optional, and because we don't want to draw a distinction between difference and not-difference in standard names for temperature that we don't make for other quantities. If we introduce a new attribute for this purpose, it would also be CF-specific, and it would be hard to get it into widespread use, I suspect.

Therefore I think it is better to put the information we need in the units. That is mandatory in CF and not CF-specific. If we don't want to create a new delta_K unit, we could instead put a separate word in there e.g. units="delta K", but I like that less because as far as we know this is only a problem for temperature. In either case, the point is that the modified units will fail in the unmodified UDUNITS, so we will avoid the pitfall which concerns us. I am not ignoring the arguments that the distinction between Celsius and Kelvin scales is not one of units, but nonetheless it is one of units! That is why we're concerned about it. I feel that allowing air_temperature = 300 K and air_temperature = 300 degC to mean the same would be terribly confusing, because that is not how the units of K and degC are used in practice. We cannot legislate to create an ideal world. We will probably succeed better by working with things as they are, so long as we can do it in a logical way that can be clearly explained and won't be misleading.

I agree with you that we would also need text in Sect 3 about this issue, and an indication in the standard name table of which quantities may need this treatment. Although I don't think cell_methods provides the solution, I also agree that we need to discuss it in connection with cell_methods, regarding standard_deviation for temperature, which is in delta_K, not K. For variance, K2 is OK, if CF adopt the rule that UDUNITS appears to follow, which seems sensible, that offset is applied only to K.

Best wishes

Jonathan

martinjuckes commented 2 years ago

@JonathanGregory : thanks for answering my suggestion. I see that your objection to following SI is that you find it confusing (or perhaps expect it to confuse others). Given the broad acceptance of SI in the scientific world, I am sceptical about this. It is perhaps worth noting that there is a difference between what the standard allows and what we might recommend. Our convention allows, for instance, wind speed to be provided with units of parsec jiffy-1, with 1 parsec jiffy-1 = 3*10^18 m s-1 ... but allowing it does not mean that it is recommended.

Adopting standards does, inevitably, require some adjustment. I'm always surprised by the level of resistance in this community towards adopting standards, given the enthusiasm for imposing them. I'm afraid I will find it much harder to promote the benefits of adopting CF as a standard if CF cannot itself make the effort to adopt a standard which is so widely adopted.

When you talk about "how the units L and degC are used in practice` you are, of course, talking only about people who agree with you, i.e. those who do not use the SI approach to units. Many people, as I'm sure you realise, do use the SI approach. Note that it is a standard, not legislation, and it already exists .. it is not something we would have to create.

You are clearly going to have to explain what your degC is. I.e. that it is an SI °C with an additional constraint on how it can be used.

larsbarring commented 2 years ago

Dear Martin,

I accept your point that °C = K, but I have difficulties seeing how this necessarily leads to your statement "units °C and K are interchangeable, with 1°C = 1K by definition". I have not fully wrapped my head around the content of the SI Brochure, but on p.135 the following text piece appears:

As a result of the way temperature scales used to be defined, it remains common practice to express a thermodynamic temperature, symbol T, in terms of its difference from the reference temperature T_0 = 273.15 K, close to the ice point. This difference is called the Celsius temperature, symbol t, which is defined by the quantity equation t = T − T_0 . The unit of Celsius temperature is the degree Celsius, symbol °C, which is by definition equal in magnitude to the unit kelvin. A difference or interval of temperature may be expressed in kelvin or in degrees Celsius, the numerical value of the temperature difference being the same in either case. However, the numerical value of a Celsius temperature expressed in degrees Celsius is related to the numerical value of the thermodynamic temperature expressed in kelvin by the relation t/°C = T/K − 273.15

where in particular my attention is drawn to the last paragraph, which to me clearly states i) 1 Δ°C = 1 K (and by extension 1 Δ°C = 1 ΔK) ii) if we have some object at 274.15 K thermodynamic temperature and measure this temperature using the Celsius scale the numeric values will be (274.15 K) / K - 273.15 = (1.0 °C) / °C. I.e. we read the value 1.0. And, likewise if we do the same with an object at 1 K thermodynamic temperature we get (1 K) / K − 273.15 = (-272.15 °C) / °C. I.e. we read the value -272.15. To me this is does not support the statement that "1°C = 1K by definition".

And in the Table on p. 138 the entry stating "°C = K" do carry the footnote that I was citing. This footnote seems to be more in line with i) and ii) than your statement "1°C = 1K by definition".

I might very well be missing something, but I have found no specific text supporting your statement in the SI Brochure. Maybe there is a subtle difference that escapes me, in which case I wonder to what extent this distinction is central to CF, and whether all the wide variety of data producers across the globe are fully aware or bother of this, and whether it is important to data users.

sebvi commented 2 years ago

@larsbarring You are not missing anything, SI defines °C wrt to K exactly like you understand it, i.e. T(°C) = 1°C/K (T(K) - 273.15K) and ΔT(°C) = 1 °C/K ΔT(K) but it is poorly formulated in the text and can be quite misleading.

Regarding variance: although the units of the variance are the square of the units of the base quantity, it will not lead to conversion errors in the case of (°C)² -> K² because the computation of the variance involves differences wrt a mean, i.e relative temperature to the mean. If it was a square (or a product) of absolute temperatures, then it would be problematic, involving several terms.