cf-convention / vocabularies

Issues and source files for CF controlled vocabularies
3 stars 1 forks source link

Standard name for measurement, or instrument temperature #75

Closed DocOtak closed 4 years ago

DocOtak commented 4 years ago

In the big "adopt CF" project at CCHDO we have run into parameters which describe the temperature of the analysis. It is standard practice on the GO-SHIP cruises I've been on to report the quantity as measured without temperature correction and the temperature it was measured at is reported along side. The parameters in our data holdings (that I can think of) which have this issue are:

For the two gasses, this is the temperature of the equilibrator. For pH it is the temperature the sample is held at while being analyzed. Often the pH temperature will be the same for an entire cruise due to the use of water baths, but sometimes not. The equilibrator temperatures tend to be within a few degrees of the sample being analyzed. These data are reported this way to facilitate the calculation of carbon parameters using the most up to date conversions.

I managed to find one precedent in the name table: temperature_of_sensor_for_oxygen_in_sea_water

I'm going to propose a broad standard name for a discussion starting point: - Term temperature_of_analysis - Description The temperature of analysis is the temperature that would be used for correcting, calculating, or calibrating another measurement for the effects of temperature. The linkage between the data variable and the variable with a standard_name of temperature_of_analysis is achieved using the ancillary_variables attribute. - Units k

JonathanGregory commented 4 years ago

Dear @DocOtak Following the example of the oxygen sensor, I think it would be OK to add standard names for these quantities, provided they are quantities which are used by more than one data archive. If that's so, they would evidently be useful for identifying quantities which should be regarded as comparable, which is the main purpose of standard names. For that purpose, and following standard names in general, I think they should be more specific - that is, three different ones, each saying what it's for. Best wishes Jonathan

ngalbraith commented 4 years ago

When we supply temperature_of_sensor_for_oxygen_in_sea_water, this value has already been used to correct the O2 data. If I understand correctly, this is not the case for the temperature values in GO-SHIP files. The proposed definition says 'the temperature that would be used' which could potentially be misunderstood - maybe it could be more explicit?

I'd be quite happy with temperature_of_analysis as a standard name, and allowing it to be used for any variable for which it could be helpful. To my mind, these values would never be used independently of the measured variable they help describe; they could be provided as attributes, but having them as variables allows you to provide multiple values (over time, for example). Although there are only three examples in the original proposal, I suspect there may be a lot more. Having a single standard name seems much more efficient to me.

roy-lowry commented 4 years ago

I totally agree with Nan that temperatue_of_analysis as a single Standard Name and linkage to the associated variable is preferable. It is quite possible for somebody to measure a couple of dozen analytes in a water sample and wish to associate them with a single temperature. Having separate standard names would require a separate temperature variable for each analyte.

DocOtak commented 4 years ago

I had a bit of a browse though our data files here and it seems that we have both situations of one name (column label) being used for a measurement prior to temperature correction, but also when reporting the temperature which some measurements have been corrected to. For pH, I'm able to find data where the reported temperature is all 25°C implying that it has all been corrected to this temperature. But I'm also able to find cruise where the reported temperature varies between 9° and 26° with the expected jumps in pH units given the change. pH in particular doesn't make a lot of sense without a temperature, either given by convention or explicitly provided. I've found strong language in the literature along the lines of "pH without a temperature is incoherent".

Basically, the temperature I'm attempting to give a standard name to, is the temperature you should use when comparing one set of measurements to another, regardless of the source of that temperature. I'm unsure what to call this. "comparison_temperature" maybe? Quite open to suggestions.

roy-lowry commented 4 years ago

@DocOtak I understand exactly what temperature_of_analysis means but comparison_temperature leaves me scratching my head. I guess this is because the former describes what the measurement is rather than the purpose for which it is used. temperature_of_analysis also covers the use case where lab temperatures are recorded to be used in the computation of concentration per unit mass from measurements made per unit volume.

ngalbraith commented 4 years ago

I had a bit of a browse though our data files here and it seems that we have both situations of one name (column label) being used for a measurement prior to temperature correction, but also when reporting the temperature which some measurements have been corrected to.

This is an important point, thanks for bringing this forward. If we use temperature_of_analysis for both temperatures that have been used to correct the data and cases where it is meant to be applied by the user of the data, there has to be a clear method for the user to know the difference, to avoid correcting twice. I suppose we could recommend an attribute to be supplied to the target variable, or we could suggest that that variable not be given its normal standard name if it's uncorrected.

We have a similar situation with winds and currents, where we need to distinguish between values that have had magnetic corrections applied vs raw data (the difference is that there's no standard name for magnetic correction). We don't always specify that the uncorrected wind vectors are raw ... we do supply an attribute 'magnetic_correction_to_be_applied' or 'magnetic_correction_applied' but I realize that our solution is not at all standard, and is probably unseen or ignored by most users.

It still makes sense, to me, to use this standard name whether or not the temperature correction has been applied, because the meaning of the temperature value doesn't change when the target variable changes. So the distinction has to be supplied as an attribute (or naming convention) on the target variable.

roy-lowry commented 4 years ago

I totally agree with Nan's last paragraph - the temperature should be named for what it is and not for its intended application or the consequences thereof.

The issue of how to describe the detailed processing history of a measurement is one that has bugged me for decades. Its sorry history has included semantics in units of measure. There were also my efforts using P01 parameter descriptions to indicate when specific corrections had been applied. These sort of worked until data processing protocols changed. I am convinced the way forward is through standardised provenance metadata as covered by one of the breakout groups in the recent CF workshop and not through simple labels like standard names.

DocOtak commented 4 years ago

Hey @roy-lowry it sounds to me like my original request just needs to have the definition improved?

The meaning I am attempting to capture is along the lines of "these are the temperatures I am reporting this other variable at" and don't actually care (in the standard name sense) about the processing steps taken to arrive at that temperature, but do care about the temperature being kept in sync with the changes to the data variable. That is, if I adjust my pH data variable from what they were measured at to some standard temperature, I should update the temperature data variable to be what I adjusted it to. If you and Nan think that temperature_of_analysis can capture this relationship then lets go with it and I can attempt to flesh out the definition.

I'd also be happy with names along the lines of temperature_of_sea_water_ph for this particular use case. But would still need to deal with the fCO2 and pCO2 temperatures eventually.

roy-lowry commented 4 years ago

@DocOtak I think your first attempt at the name was spot on target. How about this for the description?

The temperature of analysis is the temperature at which the sample was analysed and can be significantly different from the temperature of the sample when collected or the standard temperature at which the analyte measurement is reported. It has been used for correcting, calculating, or calibrating the analyte measurement values to which it has been linked using the ancillary_variables attribute.

ngalbraith commented 4 years ago

That is, if I adjust my pH data variable from what they were measured at to some standard temperature, I should update the temperature data variable to be what I adjusted it to.

This is a problem; the temperature correction, once applied, should be documented separately, IMHO, unless you're going to carry along the uncorrected measurement value and its temperature. Otherwise, how will users know that this isn't the temperature at which the measurement was taken?

roy-lowry commented 4 years ago

I agree with Nan. temperature_of_analysis is not the same thing as the temperature to which a measurement has been corrected and the two do not belong in a single data stream under a single standard name. I have come across people doing analyses in the ship's constant-temperature laboratory set to 25 C so a value of 25 could well be an actual temperature of analysis.

Like Nan my view is that the reporting of the temperature to which a measurement has been corrected belongs in separate documentation. In the long term this could be a standardised provenance document/attribute. In the short term I'd make use of the long name attribute for the measurement. For example, I would set the long name for pH to something like 'pH corrected to 25C'.

DocOtak commented 4 years ago

Hey @roy-lowry and @ngalbraith I "just" had a rather long conversation with Chris Sabine about this (he conveniently popped by the office I was having lunch in). He is in agreement with you that this name is fine to describe what we call "ph temperature" in our (CCHDO's) data holdings.

In the description, we probably want to avoid making any sort of statements about the level of processing, there is the ACDD processing_level attribute which could probably capture that detail, here is another crack at a description for temperature_of_analysis:

The temperature_of_analysis is the relevant temperature for the effects of temperature the on the measurement of another variable. This temperature might be measured, calculated, or assumed. (optional examples here) The linkage between the data variable and the variable with a standard_name of temperature_of_analysis is achieved using the ancillary_variables attribute on the data variable.


Optional Example text:

For example, the temperature of the sample when measuring pH, or the temperature of equilibration in the case of dissolved gasses.


I put the "assumed" text in there because some of the pH analysis is performed by setting a water bath/jacket to some temperature and (reasonably) assuming that the sample is at the same temperature.

I managed to find on page 41 of the old woce manual that this PHTEMP is indeed supposed to be the "Measurement temperature"

roy-lowry commented 4 years ago

Not absolutely comfortable with that description, but a couple of minor changes (bold) would help.

The temperature_of_analysis is the reference temperature for the effects of temperature the on the measurement of another variable. This temperature should be measured, but may have been calculated, or assumed. For example, the temperature of the sample when measuring pH, or the temperature of equilibration in the case of dissolved gases. The linkage between the data variable and the variable with a standard_name of temperature_of_analysis is achieved using the ancillary_variables attribute on the data variable.

JonathanGregory commented 4 years ago

A week ago I said I wasn't convinced about temperature_of_analysis because it's so generic. Since then the discussion has gone on usefully, and I understand the need for it. I still think that the name itself is not sufficiently specific or informative for a standard name. If I understand you correctly, you are concerned about the analysis of composition of sampled sea water. Do you think you could add some words to the proposed standard name to convey that meaning? Jonathan

roy-lowry commented 4 years ago

I guess the obvious would be temperature_of_analysis_of_sea_water_sample

@JonathanGregory Does that address your concerns?

Anybody have other suggestions?

DocOtak commented 4 years ago

@roy-lowry I approve of your changes to that description

@JonathanGregory My background is in oceanography so that is where I draw my examples from. This proposed name "feels" useful for measurements that aren't related to sea water. I think there is a somewhat growing category of names that seem to be unable to exist on their own and these are tending to be more generic, the quality flag standard names are what come to mind. Even the name temperature_of_sensor_for_oxygen_in_sea_water isn't really independent of the oxygen values it accompanies.


General discussion:

In another thread Roy had suggested we might consider making a standard name modifier list independent from the CF document To me, this proposed temperature_of_analysis name seems to fit right into that. It also seems to pass the test of being "ancillary information about the data variable" in @JonathanGregory's opinion that standard name modifiers are a bad part of the CF Standard.

E.g. the following could be in the standard name attribute: "sea_water_ph_reported_on_total_scale temperature_of_analysis" (units modified to K) or even something like: "sea_water_ph_reported_on_total_scale pressure_of_analysis" (units modified to Pa)

I've found that when showing netCDF files to folks, they will often do something equivalent to "ncdump" and just look at that output. While CF has a mechanism for tying associated variables together, these mechanisms don't seem to be immediately obvious to those unfamiliar with them. One of the largest concerns Chris had when I was talking to him yesterday was ensuring the link between "temperature_of_analysis" and the pH measurement. I suspect that having this sort of "standard name modifier" + "ancillary variable link" would go quite a ways in helping the accessibility and approachability of these data and that we should be tolerant of this verbosity for the sake of being more "self describing" for people who will likely never read the CF documentation.

Though... when arguing with myself about this, I can see use cases for both an independent temperature_of_analysis tied to multiple analytes, and a modifier version to "obviously" tie this temperature to some specific analyte.

ngalbraith commented 4 years ago

I guess the obvious would be temperature_of_analysis_of_sea_water_sample

My concern with this is that there may be cases where this ancillary variable is not used to describe a physical sample, but where a temperature measurement is concurrent with another measurement. If 'sea_water_sample' doesn't imply an actual physical sample (i.e. a bottle) then this is fine with me, and I agree it's more clear. If it makes the new standard name inappropriate for in situ ph measurements, though, then that could be a problem.

roy-lowry commented 4 years ago

@ngalbraith I agree that from an oceanographic perspective 'temperature_of_analysis' is more likely to deliver what we need with minimal possibility of causing issues like your in-situ pH use case. I was trying to find a solution to Jonathan's comments.

@JonathanGregory Can you propose a form of words that would deliver what you're looking for that we can test/amend against oceanographic use cases?

Orr anybody else any ideas????

JonathanGregory commented 4 years ago

Dear all

I sympathise with @DocOtak's comments, that show we haven't worked out a clear policy for instrument-related standard names yet. We don't have enough experience of them, so we're feeling our way. I think that we shouldn't propose any new sort of metadata concept at this point, following a general CF principle of not trying to invent things for which there isn't a clear use-case. We do not need a general principle to deal with only a handful of specific cases, since a small number of entries isn't a problem for the standard_name table. If we receive requests for hundreds of names on the same pattern, we might reconsider. Also, it's OK to define both more and less specific standard names, for different applications.

I wonder whether @ngalbraith's comment means there is a need for two different standard names, for different purposes. temperature_of_analysis_of_sea_water_sample sounds fine to me. I'm not an expert. I would understand this to mean the temperature you had used or assumed the water to have when analysing its chemical composition, on the ship or somewhere else. What is the other temperature Nan mentions? Is it the temperature of the water when still in the sea, at the time you're making some other measurement?

Cheers

Jonathan

graybeal commented 4 years ago

Hi friend, I know this is revisiting an old discussion, but I think it is an excellent example of my original motivation.

I too sympathise with @DocOtak's comments, with some vigor. I think comments like

While CF has a mechanism for tying associated variables together, these mechanisms don't seem to be immediately obvious to those unfamiliar with them.

are to the point, but my support also goes to the ease of composability of CF terms.

Way back, when I was trying to use CF to describe a wide variety of ocean measurements, I was essentially defeated by the need to individually define and request every term across every variation. The answer then was also "We do not need a general principle to deal with only a handful of specific cases, since a small number of entries isn't a problem for the standard_name table."

So I didn't go to the trouble of defining a few hundred more use cases, because it would have taken me several days or weeks to do so, and months or years more to demonstrate the proposition that this was not a one-off situation. Instead, I limited my involvement to a small set of vocabulary and standardization decisions. And when asked, I told other people working with me on cyberinfrastructures that CF was not practical to adopt at scale, because it couldn't provide the necessary naming agility. (It's possible CF didn't hear about their use cases.)

So I still love the precision, intellectual rigor, and value of CF, especially as it is so thoroughly pursuing modern engineering and presentation technologies. So please consider me a thankful enthusiast on all those fronts.

But I still argue that CF turns away from the uses cases that demonstrate the broader need. Even as it adopts a more general approach for one use case (taxonomies), and completely standardizes the definition of many "phrases of art" in its vocabularies—indicating how truly generic they are across many different measurements—CF does not allow 'on principle' that many concepts apply across a wide swath of observations, and that these should be made applicable across that wide swath. There are other concept management models that afford such generality, so for me this is more of a "last pitch" than a personal use case.

So please put me in the support column for having a modifier like temperature_of_analysis for CF variable names. (Would you still need some way to indicate exactly which analyses this temperature applies to, since the precursor standard name could be applied to multiple different analyses at different temperatures in the data set?)

And at the risk of sounding contradictory, I think it is also OK (and not contradictory) to have a standalone name temperature_of_analysis, to allow for the possibility of the temperature applying a wide variety of analyses.

Since the decision may be taken on other grounds, I don't think we need to rehash the above arguments in order to proceed on this specific case. (If someone wants to pursue them, they should probably start another ticket.) I just wanted to bring out another view about whether a general principle was useful in this case.

ngalbraith commented 4 years ago

I wonder whether @ngalbraith's comment means there is a need for two different standard names, for different purposes. temperature_of_analysis_of_sea_water_sample sounds fine to me. I'm not an expert. I would understand this to mean the temperature you had used or assumed the water to have when analysing its chemical composition, on the ship or somewhere else. What is the other temperature Nan mentions? Is it the temperature of the water when still in the sea, at the time you're making some other measurement?

Yes, it's the sea water temperature, and for various reasons it might not be useful as a stand-alone data variable, and therefor should not be given the existing standard name sea_water_temperature.

Is there a reason to differentiate between analysis temperatures that involve sea water vs air? Temperature_of_analysis would cover both cases, and, presumably, the analysis medium would be clear from the 'target' variable, if it needed to be know.

Temperature_of_analysis_of_sea_water_sample is also fine, as long as 'sea_water_sample' is not taken to imply that there was a sample (bottle) with this temperature. So, if you think temperature_of_analysis_of_sea_water_sample might be correctly used to identify what's now labeled as temperature_of_sensor_for_oxygen_in_sea_water, then it's good. Otherwise, we might need to distinguish between temperature_of_analysis_of_sea_water_sample and perhaps, temperature_of_analysis or temperature_of_analysis_of_sea_water - or similar.

JonathanGregory commented 4 years ago

Dear Nan

Yes, it's the sea water temperature, and for various reasons it might not be useful as a stand-alone data variable, and therefor should not be given the existing standard name sea_water_temperature.

Please could you explain further or suggest some phrase which correctly describes it? When is a sea_water_temperature not a sea_water_temperature? :-)

Temperature_of_analysis_of_sea_water_sample is also fine, as long as 'sea_water_sample' is not taken to imply that there was a sample (bottle) with this temperature.

I suppose it might imply a bottle to a non-expert such as me. Then perhaps temperature_of_analysis_of_sea_water (with no bottle) would be better. Would temperature_of_chemical_analysis_of_sea_water be correct? - that would be more informative. I would not have understood temperature_of_sensor_for_oxygen_in_sea_water correctly without reading its definition, which says it is "the instrument temperature used in calculating the concentration of oxygen in sea water; it is not a measurement of the ambient water temperature". That seems similar to this analysis temperature, as you say.

Is there a reason to differentiate between analysis temperatures that involve sea water vs air? Temperature_of_analysis would cover both cases, and, presumably, the analysis medium would be clear from the 'target' variable, if it needed to be know.

As I said before, if these are standard names, they ought to be self-explanatory, not requiring reference to something else for essential clarification. Hence the medium should be included, as it is in sea_water_temperature and air_temperature.

Best wishes

Jonathan

DocOtak commented 4 years ago

temperature_of_analysis_of_sea_water is ok I guess... including the term "sample" has discrete implications and often we do measurements of a continuous flow of sea water. Including the term "chemical" seems to exclude my need to describe dissolved gas analysis done using equilibration.

sea_water_temperature as per the description for that name, is the in situ temperature and ceases to be the correct name for the temperature of some bit of sea water as soon as you move it, e.g. to do some sort of analysis. There are a few names in the standard name to describe the temperature of this moved water that are of varying correctness and assume you can move the sea water adiabatically.

For pH, the name sea_water_ph_reported_on_total_scale is not "self-explanatory" enough to be considered comparable by the ocean carbon community. If I don't know the temperature you are reporting that pH at, I cannot compare these values. This requirement is strong enough that pH values that lack temperature information will be omitted from oceanographic datasets, though usually will made available separately for openness. There are two additional pH reporting scales ("sea water scale", and NBS) that are used in oceanography, all these will need a temperature associated with them. I'm including this information because I have a need "right now" in my real oceanographic dataset to represent temperatures of 5 properties of sea water that are not the in situ sea_water_temperature and are considered to be critical for scientific comparison of these values. Basically, I need this temperature to share measurements with others so they can do science, not because it is needed it to correct an instrument.

roy-lowry commented 4 years ago

I agree with @DocOtak that adding 'chemical' is unhelpful as to me it implies a restriction of analytical technique.

I think this discussion needs to be drawn to a conclusion or Andrew will never get his data into CF! There seems to be a compromise consensus towards temperature_of_analysis_of_sea_water. Can we go with that?

JonathanGregory commented 4 years ago

Yes, I think temperature_of_analysis_of_sea_water is fine.

Also, Andrew @DocOtak's useful comment, explaining the relevance of this parameter, suggests another possibility for associating it with the data variable. The way you describe it, I understand that it's a numerical parameter on which the result of the analysis (the pH, etc.) depends. Is that right? If so, I think it would be possible to treat it in CF as a coordinate variable for the data variable. It is analogous to other cases in which a single numerical value is needed to define a quantity which has a standard name. For instance, number_of_days_with_air_temperature_above_threshold needs a coordinate variable of air_temperature, and sea_water_potential_density is allowed to have a coordinate variable of reference_pressure. I suppose that the analysis temperature could be a different value for each element of the data variable - is that right? If so, unlike those other cases, it would have to be a multidimensional auxiliary coordinate variable, rather than a single-valued coordinate variable. This might be stretching the data model a bit, since auxiliary coordinate variables aren't intended for independent variables on which the data depends, but I think it would be a tolerable distortion.

What I mean is

  float pH (lat,lon);
    pH:standard_name="sea_water_ph_reported_on_total_scale";
    pH:coordinates="anat";
  float anat(lat,lon);
    anat:standard_name="temperature_of_analysis_of_sea_water";
    anat:units="degC";

Does that make sense? With this scheme, different quantities could share the same analysis temperature field, or each have their own.

Jonathan

ngalbraith commented 4 years ago

If the temperature is presented as a coordinate, that could very easily lead to confusion, when a data variable has been adjusted to a standard temperature that is NOT the temperature at which the value was measured.

Are we clear about if this standard name is to be used whether or not a temperature correction to the target value has been made? It seems like we're dropping some details that might lead to errors in implementation, but this is a little out of my area, so if not, that's fine.

roy-lowry commented 4 years ago

@ngalbraith Hopefully yes. Whether or not a measurement has been corrected shouldn't be an issue for the temperature at which it was analysed. Hopefully, the information on corrections applied will be (as you suggested) in associated provenance metadata.

If no better provenance metadata solution for CF is developed in time then the long name attribute is available to Andrew to store phrases like 'pH corrected to in-situ temperature' or 'pCO2 expressed at 25C'.. Should that be unacceptable then we may need to start another GitHub ticket to discuss how reporting temperature should be delivered. The important thing is to keep it separate from the subject of this ticket i.e. temperature of analysis.

DocOtak commented 4 years ago

@JonathanGregory I somewhat like that idea, but I'm not sure of the usefulness/complexity trade offs. The folks I work with are quite used to having quality flags to go with their data. In our existing formats, this is done via csv column naming conventions. If I had my teaching hat on trying to explain this to someone who is already extremely skeptical about the usefulness of CF/netCDF, I would prefer to have one method to need to explain for how to tie all the extra bits of information together (e.g. use ancillary_variables to tie your uncertainties, quality flags, and analytical temperatures to the primary variable). This extension should probably be addressed in another discussion thread.

@ngalbraith In the database/dataset I help manage, we are supposed to receive, as per all the actual documentation I could find, the "as analyzed" temperatures and values. The reality is that these are observations and have all the ambiguities and messiness associated with having multiple labs, PIs, and techs all doing things slightly differently, not to mention all the things that go wrong while at sea. My group is in the position of being at or near the "last step" of publication life cycle of these data, and many of the intermediate steps are opaque to us. We have programs saying "our PIs report X" so in our files we say "this is X" for whatever it is... in the case of these carbon parameters, "this is the analytical temperature" is what we are told.

The miracle of CF has been the adoption of it by folks making observations! Modelers and observationalists using the same data format, that's crazy! We'll be putting together a larger list of standard name requests as part of this project, the good news is that the majority of these are mass concentrations of something in sea water...


To recap, here is where I think we are:

JonathanGregory commented 4 years ago

I agree that this is fine for the name and definition of temperature_of_analysis_of_sea_water. It's also fine to use ancillary_variables for the association. Thanks.

feggleton commented 4 years ago

Hi all,

Firstly, thank you to Andrew Barna for this proposal and to all involved for your comments and giving summaries throughout. Reading through, this looks like it was a great discussion and potentially other discussions may come out of this. Looks like you have all come to a consensus about the term and its definition and a lot has been clarified here. I have added this term to the CF editor here: http://cfeditor.ceda.ac.uk/proposal/4482/edit - including the progression and changes from this discussion. All I have changed below is remove 'the' in the first sentence as there was one before 'on the measurement' which didn't make sense.

Term: temperature_of_analysis_of_sea_water Definition: The temperature_of_analysis_of_sea_water is the reference temperature for the effects of temperature on the measurement of another variable. This temperature should be measured, but may have been calculated, or assumed. For example, the temperature of the sample when measuring pH, or the temperature of equilibration in the case of dissolved gases. The linkage between the data variable and the variable with a standard_name of temperature_of_analysis_of_sea_water is achieved using the ancillary_variables attribute on the data variable. Canonical Units: K

There seems to be agreement here and no comments for the last few days, so if there are no further comments or discussion in the next week or so and everyone agrees we can accept this term into the next update.

Thanks,

Fran

feggleton commented 4 years ago

This term was accepted into Version 74 of the standard name table.