cf-convention / vocabularies

Issues and source files for CF controlled vocabularies
0 stars 0 forks source link

Standard names: moles_of_dissolved_inorganic_carbon_per_unit_mass_in_sea_water #143

Closed StephaneTarot closed 2 years ago

StephaneTarot commented 3 years ago

Proposer's name: Stéphane Tarot Date: 29/06/2021

Term: moles_of_dissolved_inorganic_carbon_per_unit_mass_in_sea_water Description: moles_of_X_per_unit_mass_inY is also called "molality" of X in Y, where X is a material constituent of Y. "Dissolved inorganic carbon" describes a family of chemical species in solution, including carbon dioxide, carbonic acid and the carbonate and bicarbonate anions. "Dissolved inorganic carbon" is the term used in standard names for all species belonging to the family that are represented within a given model. The list of individual species that are included in a quantity having a group chemical standard name can vary between models. Where possible, the data variable should be accompanied by a complete description of the species represented, for example, by using a comment attribute. Units: mol kg-1

The suggested description is based on the one used for mole_concentration_of_dissolved_inorganic_carbon_in_sea_water.

Best regards Stéphane Tarot

roy-lowry commented 3 years ago

Surprisingly to me this is the first time moles per unit mass of sea water has been needed. A bit of research reveals the there is a convention in Standard Names to use the word 'specific' for 'per unit mass'. This means that to follow the syntactic precedents this new Standard Name should be 'mole_specific_concentration_of_dissolved_organic_carbon_in_sea_water'.

I have slight reservations about this because the definition of specific comes from the atmospheric domain (e.g. specific humidity), but could be regarded as non-intuitive in the oceanographic domain. However, there are over 20 Standard Names where specific is used in this way, including some from oceanography such as 'specific_kinetic_energy_of_sea_water' so I feel my reservations should be ignored.

Any other opinions?

StephaneTarot commented 3 years ago

Dear Roy,

There are already some 'moles per unit mass" without "specific": moles_of_cfc11_per_unit_mass_in_sea_water moles_of_nitrate_and_nitrite_per_unit_mass_in_sea_water moles_of_nitrate_per_unit_mass_in_sea_water moles_of_nitrite_per_unit_mass_in_sea_water moles_of_oxygen_per_unit_mass_in_sea_water moles_of_phosphate_per_unit_mass_in_sea_water moles_of_silicate_per_unit_mass_in_sea_water

I used the same construction

Stéphane

roy-lowry commented 3 years ago

Thanks Stéphane,

When I searched I didn't find them due to some user misadventure that I now can't repeat. As they exist, forget my previous comment and I fully support your suggestion.

lqjiang commented 3 years ago

Dear All,

To me, one of the major issues of the CF terms is the fact that they try to associate standard terms with information like "molesof", and "per_unit_mass", etc. First of all, it creates an unnecessary long list of terms, because of the different units. Second, this linkage between standard terms with their units makes a lot of the terms less useful. For example, in the case of dissolved inorganic carbon, despite the available 7 terms, I literally can not find one CF term that fits our needs, because their units are all wrong.

The ocean carbon research community almost exclusively uses per unit based mass. In the case of Dissolved inorganic carbon, the community adopted unit is micro-mol per kg-seawater (umol/kg). According to the IUPAC Gold book, per unit based values should be called "content" or "substance content", e.g., Nitrate content, or Substance content of nitrate, instead of concentration.

My recommendation moving forward is to decouple the standardized names with their units. In this case, the CF term should be "dissolved inorganic carbon".

Liqing

roy-lowry commented 3 years ago

Dear Liqing,

I think that you are misunderstanding the canonical units. They specify the dimensionality of the parameter and NOT the units of measure. The actual unit of measure is specified in a parameter attribute in the data file. The only constraint on this is that it matches the canonical unit dimensions. So, for this proposed Standard Name units of measure such as micromol/kg, millimoles/gram, moles/microgram are all all valid units of measure because they are all quantities of matter/mass. However units such as micromol/litre, litres/kg, milligrams/litre are INVALID. So you can use your preferred units of micromol/kg with this Standard Name with no problems providing your unit of measure is specified in the correct CF data file parameter attribute.

CF has its origins in physics, which is why one of the fundamental rules for Standard Names is that they have fixed dimensions. This is something for which I was extremely grateful a few years back in a project to develop AI-based workflow data quality checks. Dimensionality is also something that underpins the SeaDataNet/EMODNet automated data aggregation architecture.

The standard name text is build through concatenation of a set of standard phrases that are long established and have well known meaning. Changing these would be a lot of pain for what is in my opinion no gain.

Cheers, Roy.

lqjiang commented 3 years ago

Dear Roy,

Thank you for the response to my comment. I was vaguely referring to the dimensionality when I said units. Sorry for the confusion.

In terms of units, listing one single unit next to a term would give users a wrong impression that they are somewhat linked to each other unfortunately. To fix that, we could either list all of the commonly used units (it is ok to list only one if that's all of them), or list none of them.

I'm all for the use of units (who does not?), and even the dimensionality thing if it is important in physics, although I think the dimensionality information can be inferred from the unit information in most cases. My question is why do we have to couple them together? Why can't we mark the standard term name as one element, and then its units in a separate element, etc. like the below:

Variable 1: Standard name of Variable 1: Dissolved inorganic carbon Units of Variable 1: micro-mole per kg-sw Dimension of Variable 1: per kg seawater

The above setup would give us a much cleaner list, without compromising information such as units and dimensionality.

Thanks,

Liqing

roy-lowry commented 3 years ago

Dear Liquing,

Part of my answer here has to be 'RTFM'. There is a very well maintained CF Conventions document. Anybody reading this would be perfectly clear about the relationship between Standard Names, canonical units and units of measure.

What you are proposing would be worthy of consideration were we designing a parameter vocabulary from scratch. However, there are thousands of Standard Names in millions of data files supported by a surprisingly large amount of software infrastructure. Such change riding roughshod over backward compatibility would be extremely unpopular in the CF community even if there were the resources available to do it.

Cheers, Roy.

lqjiang commented 3 years ago

Dear Roy,

Thanks for the explanation!

Liqing

taylor13 commented 3 years ago

I might add that although two quantities (e.g., specific_humidity and humidity_mixing_ratio) might be loosely thought of as measures of the same thing (i.e., the amount of water vapor in the air), and they do have the "same" units (they are dimensionless), they are not the same physical quantity because

JonathanGregory commented 3 years ago

Dear Liqing @lqjiang

in the case of dissolved inorganic carbon, despite the available 7 terms, I literally can not find one CF term that fits our needs

The new name that Stéphane proposes is presumably exactly what you need (given the above discussion about canonical units), isn't it? I support the proposal.

Best wishes

Jonathan

lqjiang commented 3 years ago

Hi Taylor,

As I mentioned above, I'm all for documenting all of the information. My argument is to manage different things (name, unit, dimensionality, etc.) as separate groups of information, instead of trying to bundle them together.

Hi Jonathan,

As to the DIC example, the below CF-term would be my best choice: "mole_concentration_of_dissolved_inorganic_carbon_in_sea_water"

However, it is not perfect choice to me, because:

Sadly, its unit and dimensionality prevent me from using an otherwise perfect term. I find similar issues with the alkalinity term. BTW, in my community, we call it Total Alkalinity.

For partial_pressure_of_carbon_dioxide_in_sea_water, the unit my community uses is micro-atmosphere. Almost no one uses the SI unit of Pascal.

The reason I use them as example is because they are 3 of the 4 key carbon system parameters, dissolved inorganic carbon, total alkalinity, and partial pressure of carbon dioxide, and pH.

Many thanks for all the responses,

Liqing

JonathanGregory commented 3 years ago

Dear Liqing

The canonical units of the proposed moles_of_dissolved_inorganic_carbon_per_unit_mass_in_sea_water are mol kg-1. That is dimensionally the same as your preferred umol kg-1 (as discussed in previous comments) so this is the standard name you should use in CF. Similarly, microatmosphere is dimensionally the same as pascal.

There are many terminologies and jargons used in Earth science. In CF we try hard to choose standard names which will be self-explanatory, consistent and not confusing to anyone, regardless of their disciplinary background. These aims mean that we often don't use exactly the words which an expert in any given field is accustomed to using. It's obviously impossible to be consistent with all existing terminologies when those terminologies do not agree among themselves! In CF standard names, "content" usually refers to an extensive quantity and "concentration" to an intensive one.

Best wishes

Jonathan

JonathanGregory commented 3 years ago

To be clear, you can use units="umol kg-1" with standard_name="moles_of_dissolved_inorganic_carbon_per_unit_mass_in_sea_water" and units="microatmosphere" with standard_name="partial_pressure_of_carbon_dioxide_in_sea_water".

feggleton commented 3 years ago

Hi all,

Thank you for this proposal and for discussing and clarifying the term/units. I have now added this term to the cfeditor. I have added a couple of phrases that should have gone into the definition which were missed out.

The construction "moles_of_X_per_unit_mass_in_Y" is also called "molality" of X in Y, where X is a material constituent of Y. A chemical species or biological group denoted by X may be described by a single term such as "nitrogen" or a phrase such as "nox_expressed_as_nitrogen". "Dissolved inorganic carbon" describes a family of chemical species in solution, including carbon dioxide, carbonic acid and the carbonate and bicarbonate anions. "Dissolved inorganic carbon" is the term used in standard names for all species belonging to the family that are represented within a given model. The list of individual species that are included in a quantity having a group chemical standard name can vary between models. Where possible, the data variable should be accompanied by a complete description of the species represented, for example, by using a comment attribute.

Please continue discussions as needed or comment to say you are happy.

Thanks,

Fran

StephaneTarot commented 3 years ago

I'm happy ;-)

roy-lowry commented 3 years ago

Thanks Fran. I'm happy too.

lqjiang commented 3 years ago

Hi Fran,

I'm happy too!

Just want to add one thing. I think the "molesof" thing should be decided by the scientific community. For example, in my community (ocean carbon cycling), adding such info to "dissolved inorganic carbon" would make the term look very weird. However, for carbon dioxide, no one can simply call it carbon dioxide, because people would not know whether the carbon dioxide reported is a molecular ration (xCO2), or partial pressure (pCO2), or fugacity (fCO2). The "_in_Y" thing is also important in this example, because people commonly measure pCO2 from both the atmosphere and the seawater. Without specifying them, we could run into confusions.

Liqing

roy-lowry commented 3 years ago

Dear Liqing,

If it alleviates your concerns one of my roles in CF is to keep an eye on the expansion of Standard Names into biogeochemistry. Before retiring I spent 37 years as an oceanographic data manager specialising in biogeochemistry. My philosophy was always 'If you manage the data you need to understand the data as well as the scientists collecting it'. My learning curve included participation in over 20 research cruises.

My experience with the 'carbonate system' started in 1989 with managing the UK contribution to the JGOFS North Atlantic Bloom Experiment. I chaired the JGOFS Data Management Task Team throughout the 1990s. This involved a lot of learning about carbonate system data from the likes of Andy Watson, Carol Robinson and Doug Wallace.

CF is a very broad church involving many scientific domains in a single framework. Consequently, an element of compromise is required to provide consistency across all domains. My objective is to keep this compromise as close as possible to the specific requirements of the biogeochemical communities. Some battles I win, others I lose. However, such is the nature of a community multidisciplinary standard. This is a far better tool for interoperability than a heterogenous mixture of specific community requirements.

Cheers, Roy.

roy-lowry commented 3 years ago

Thinking about it's worth making the point that if the carbon community wish to adopt CF then it would be helpful if they would initiate (as separate GitHub tickets not on this one!!) the creation of any Standard Names that they require. CF policy is to only set up new Standard Names when requested rather than setting them up 'just in case'.

A suitable pCO2 Standard Name has existed for some time. Thanks to Stéphane this thread will result in the creation of a DIC Standard Name with the correct dimensionality. I THINK the existing pH Standard Name (based on Total Scale) is OK, but requirements may have changed.

However, there is no Standard Name for total alkalinity with the correct dimensionality. If somebody would like to propose one by creating a GitHub thread then I would happily support it.

feggleton commented 3 years ago

Ok. In terms of this standard name, if there are no further comments in the next 7 days this can be accepted.

lqjiang commented 3 years ago

Hi All,

As an FYI, I submitted a full proposal with Dr. Andrew Dickson (SIO) and Paul McElhany (NWFSC) to NOAA's Ocean Acidification Program this year to clean up all ocean acidification related terms within the CF and other repositories. It was rejected for now, but I'm hopeful that it will be funded later on. I look forward to working with this group on this issue in the future.

Liqing

feggleton commented 3 years ago

This term has now been accepted into the next update.

japamment commented 2 years ago

Changes applied in version 78 of the standard name table.

lqjiang commented 2 years ago

Hi All,

Good morning.

Is there a peer-reviewed article about the CF vocabularies I could cite?

Many thanks,

Liqing

On Tue, Sep 21, 2021 at 8:39 AM japamment @.***> wrote:

Changes applied in version 78 of the standard name table.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cf-convention/vocabularies/issues/143, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY2ZW47QXLFBMWSFSG5QSTUDB4HNANCNFSM47PRLCGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

MathewBiddle commented 2 years ago

See https://github.com/cf-convention/cf-conventions/issues/206 and https://github.com/cf-convention/cf-conventions/issues/127

davidhassell commented 2 years ago

Dear @lqjiang,

The CF data model GMD paper (https://doi.org/10.5194/gmd-10-4619-2017, and linked from the CF website) has a section (3.8) that describes standard names, their structure and the philosophy behind them. Perhaps that's useful? I include the relevant section here for convenience:


For systematic identification of the physical quantity contained in variables, CF defines a “standard name” string attribute (e.g. lines 28, 54 and 62 of Fig. 3), with permissible values listed in the standard name table (http://cfconventions.org/standard-names.html), which includes precise definitions. The standard name table is managed by a community process and is continually expanding – version 44 of the table, released in May 2017, contains 2847 standard names.

CF also upholds the use of the “long name” defined by the netCDF user guide, but this is ad hoc. In contrast, the CF standard names are consistently constructed and documented. As CF is applicable to many areas of geoscience, the standard names have to be more self-explanatory and informative than would suffice for any one area. For instance, there is no name for plain “potential temperature”, since we have to distinguish air potential temperature and sea water potential temperature. Standard names are often longer than the terms familiarly used by the experts in particular discipline, because they answer the question, “What does this mean?”, rather than the question, “What do you call this?”. For example, the quantity often called “precipitable water” by meteorologists has the standard name of atmosphere_mass_content_of_water_vapor.

Standard names have a detailed description which further defines parts of the name; for example, the description of the standard name land_ice_calving_rate notes that “land ice” means glaciers, ice caps, and ice sheets resting on bedrock, and the land ice calving rate is the rate at which ice is lost per unit area through calving into the ocean. Each standard name also implies particular physical dimensions (mass, length, time, and other dimensions corresponding to SI base units, expressed as a “canonical unit”); for example, large-scale rainfall amount (canonical unit kg m−2), large-scale rainfall flux (kg m−2 s−1), and large-scale rainfall rate (m s−1) are all different in CF, although they might all be vaguely referred to as “large-scale rain”. Standard names have been defined for both more general and more specific quantities, for different applications, e.g. ocean_mixed_layer_thickness and ocean_mixed_layer_thickness_defined_by_temperature. Some standard names require the existence of additional metadata and/or constraints on the values of the variables with which they are associated. For example, the standard name of downwelling_radiance_per_unit_wavelength_in_air requires there to be a coordinate variable storing the radiation wavelength.

The CF conventions use size one or scalar coordinate variables (Sect. 3.3) and the cell_methods attribute (Sect. 3.6) to describe some aspects of a variable, and this means standard names do not always correspond to identities of variables in other file formats. For instance, to describe the time-mean air temperature at 1.5 m above the ground, air_temperature alone is the standard name; “time-mean” is described by cell_methods and the height as a coordinate.

lqjiang commented 2 years ago

Many thanks, David!

This is very helpful. I hope you have a great week.

Liqing

On Mon, Sep 27, 2021 at 9:03 AM David Hassell @.***> wrote:

Dear @lqjiang https://github.com/lqjiang,

The CF data model GMD paper (https://doi.org/10.5194/gmd-10-4619-2017, and linked from the CF website) has a section (3.8) that describes standard names, their structure and the philosophy behind them. Perhaps that's useful? I include the relevant section here for convenience:

For systematic identification of the physical quantity contained in variables, CF defines a “standard name” string attribute (e.g. lines 28, 54 and 62 of Fig. 3), with permissible values listed in the standard name table (http://cfconventions.org/standard-names.html), which includes precise definitions. The standard name table is managed by a community process and is continually expanding – version 44 of the table, released in May 2017, contains 2847 standard names.

CF also upholds the use of the “long name” defined by the netCDF user guide, but this is ad hoc. In contrast, the CF standard names are consistently constructed and documented. As CF is applicable to many areas of geoscience, the standard names have to be more self-explanatory and informative than would suffice for any one area. For instance, there is no name for plain “potential temperature”, since we have to distinguish air potential temperature and sea water potential temperature. Standard names are often longer than the terms familiarly used by the experts in particular discipline, because they answer the question, “What does this mean?”, rather than the question, “What do you call this?”. For example, the quantity often called “precipitable water” by meteorologists has the standard name of atmosphere_mass_content_of_water_vapor.

Standard names have a detailed description which further defines parts of the name; for example, the description of the standard name land_ice_calving_rate notes that “land ice” means glaciers, ice caps, and ice sheets resting on bedrock, and the land ice calving rate is the rate at which ice is lost per unit area through calving into the ocean. Each standard name also implies particular physical dimensions (mass, length, time, and other dimensions corresponding to SI base units, expressed as a “canonical unit”); for example, large-scale rainfall amount (canonical unit kg m−2), large-scale rainfall flux (kg m−2 s−1), and large-scale rainfall rate (m s−1) are all different in CF, although they might all be vaguely referred to as “large-scale rain”. Standard names have been defined for both more general and more specific quantities, for different applications, e.g. ocean_mixed_layer_thickness and ocean_mixed_layer_thickness_defined_by_temperature. Some standard names require the existence of additional metadata and/or constraints on the values of the variables with which they are associated. For example, the standard name of downwelling_radiance_per_unit_wavelength_in_air requires there to be a coordinate variable storing the radiation wavelength.

The CF conventions use size one or scalar coordinate variables (Sect. 3.3) and the cell_methods attribute (Sect. 3.6) to describe some aspects of a variable, and this means standard names do not always correspond to identities of variables in other file formats. For instance, to describe the time-mean air temperature at 1.5 m above the ground, air_temperature alone is the standard name; “time-mean” is described by cell_methods and the height as a coordinate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cf-convention/vocabularies/issues/143, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY2ZW5FB34VCBWGLIQIZ4DUEBTQ5ANCNFSM47PRLCGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.