cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
85 stars 43 forks source link

Section 6.1.2 Taxon Names and Identifiers standard name for biological taxon identifier #308

Closed MathewBiddle closed 2 years ago

MathewBiddle commented 3 years ago

In reading Section 6.1.2 Taxon Names and Identifiers, the second paragraph (and skeleton example) describe using the CF standard name biological_taxon_lsid. However, in CF standard name table v76, the term is biological_taxon_identifier. I assume the documentation should be updated?

I can put in a pull request for the change, if that's appropriate.

davidhassell commented 3 years ago

Thanks for spotting this inconsistency, @MathewBiddle.

What you say seems fine to me, but it would good if @roy-lowry could confirm that this is indeed the case (and this is not indicative of something deeper ....).

Once Roy has responded, a pull request would be most welcome, thanks.

roy-lowry commented 3 years ago

These things are never as simple as they look at first. I've been back to the original discussion on Trac and the history here is that biological_taxon_identifier was the original proposal for the identifier Standard Name allowing identifiers for any standard to be used. This was criticised because users of the data had no way of knowing how to resolve the identifier and so the strategy switched to providing information on how to resolve the identifier through the adoption of LSIDs.

Unfortunately, when I set up the Standard Names I screwed up by setting up the request early on in the discussion (circa 2013!) asking for biological_taxon_identifier and then forgetting to update it to reflect the subsequent Trac discussion.

So, in a nutshell the Conventions Document is correct but the Standard Name is wrong. The fix would be to deprecate biological_taxon_identifer and alias it to a new Standard Name biological_taxon_lsid. This would require references to 'biological_taxon_identifier' in several Standard Name descriptions changing to 'biological_taxon_lsid'.

Shall I set this in motion?

@japamment If the answer to the above question is 'yes' can I do it through this ticket or do I need to open a new one.

roy-lowry commented 3 years ago

Also see #309

davidhassell commented 3 years ago

I am happy for the inconsistency to be fixed with an alias, as @roy-lowry suggests.

roy-lowry commented 3 years ago

The following is the revised Standard Name specification - I'll put here as a placeholder at least:

biological_taxon_lsid

"Biological taxon" is a name or other label identifying an organism or a group of organisms as belonging to a unit of classification in a hierarchical taxonomy. The quantity with standard name biological_taxon_lsid is the machine-readable identifier based on a taxon registration system using the syntax convention specified for the Life Science Identifier (LSID) - urn:lsid:::[:]. This includes the reference classification in the element and these are restricted by the LSID governance. It is strongly recommended in CF that the authority chosen is World Register of Marine Species (WoRMS) for oceanographic data and Integrated Taxonomic Information System (ITIS) for freshwater and terrestrial data. See Section 6.1.2 of the CF convention (version 1.8 or later) for information about biological taxon auxiliary coordinate variable. This identifier is a narrower equivalent to the scientificNameID field in the Darwin Core Standard.

biological_taxon_lsid should replace biological_taxon_identifer by alias and also as text in the descriptions of Standard Names:

colony_forming_unit_number_concentration_of_biological_taxon_in_sea_water mass_concentration_of_biological_taxon_expressed_as_nitrogen_in_sea_water mole_concentration_of_biological_taxon_expressed_as_carbon_in_sea_water number_concentration_of_biological_taxon_in_sea_water mass_concentration_of_biological_taxon_expressed_as_chlorophyll_in_sea_water mass_concentration_of_biological_taxon_expressed_as_carbon_in_sea_water mole_concentration_of_biological_taxon_expressed_as_nitrogen_in_sea_water

roy-lowry commented 3 years ago

My limited GitHub skills have caused the LSID syntax not to render correctly due to embedded chevrons

This is it with curly brackets instead of chevrons so it renders correctly.

urn:lsid:{Authority}:{Namespace}:{ObjectID}[:{Version}]

davidhassell commented 3 years ago

Hi Roy, thanks.

(Putting stuff in backticks usually does the trick: `urn:lsid:\<Authority>:\<Namespace>:\<ObjectID>[:\<Version>]` renders as urn:lsid:<Authority>:<Namespace>:<ObjectID>[:<Version>]. I used a generous sprinkling of protecting \ to get the chevrons and backticks to appear as plain text in the first version, but perhaps the backticks-only version is quicker)

fcarvalhopacheco commented 3 years ago

Hi all,

Could you please verify if the following example could be viable? We are planning to include/suggest the following TERM at some point but need some help.


Term: number_concentration_of_prochlorococcus_in_sea_water

-Definition: "Number concentration" means the number of particles or other specified objects per unit volume. Abundance of Prochlorococcus (ITIS: 610076: WoRMS 345515) per unit volume of the water body by flow cytometry. Number of particles resolved as the cyanobacteria Prochlorococcus cells in a unit volume of any body of fresh or saltwater determined by flow cytometry analysis of unstained samples (NERC-1).’

-Units: [m-3]

-References: NERC-1:http://vocab.nerc.ac.uk/collection/P01/current/P701A90Z/4/ NERC-2:http://vocab.nerc.ac.uk/collection/F02/current/F0200002/1/


roy-lowry commented 3 years ago

@fcarvalhopacheco That is an invalid Standard Name as it includes a taxon name. What you need is an array with taxon as one of its dimensions containing the abundances with the Standard Name number_concentration_of_biological_taxon_in_sea_water. The taxon co-ordinate has two vectors with Standard Names biological_taxon_name and biological_taxon_lsid (currently erroneously called biological_taxon_identifier - the subject of this defect, which will hopefully be fixed in the near future) carrying the text name and the LSID for each taxon. This means we don't need 200 Standard Names for a dataset with abundances of 200 taxa. WoRMS is the preferred authority for marine organism LSIDs. Think of the data as a spreadsheet with abundances in the cells and columns called biological_taxon_name and biological_taxon_LSID

There's a skeleton example in Section 6.1.2 of the Conventions Document version 1.8.

There is a complication in cases where the data set contains data for biological entities that aren't taxa such as picophytoplankton. Each of these needs its own Standard Name for each measurement. I'm not totally comfortable with this. When I started setting up the taxon conventions back in 2013 I wanted all biological entities to be allowed, but this was rejected because they would be unconstrained plaintext labels and this was considered too loose for CF. A suggestion to constrain against the S25 vocabulary with BODC as the authority was also not well received. In the past few weeks I looked for support to treat all biological entities as taxa but got none and am not in a position to try to take it forward myself.

Does that help?

fcarvalhopacheco commented 3 years ago

Thanks, @roy-lowry for the reply!

So we don't need to create anything new, we just need to use the Standard Name: number_concentration_of_biological_taxon_in_sea_water, including thebiological_taxon_name and the biological_taxon_lsid for each of our "variables"(see below)

"variables" (still need to be confirmed)

"Prochlorococcus" = "urn:lsid:marinespecies.org:taxname:345515" "Bacteria" = "urn:lsid:marinespecies.org:taxname:6" "Synechococcus" = "urn:lsid:marinespecies.org:taxname:160572" "Cyanobacteria " = "urn:lsid:marinespecies.org:taxname:146537"


Please, see if the following example for "Prochlorococcus" would be valid for our case:


dimension: time = 100 ; string80 = 80 ; taxon = 1 ; "Can we include the other 3 taxon here? So total = 4"" variables: float time(time); time:standard_name = "time" ; time:units = "days since 2019-01-01" ; float abundance(time,taxon) ; abundance:standard_name = "number_concentration_of_organisms_in_taxon_in_sea_water" ; abundance:coordinates = "taxon_lsid taxon_name" ; char taxon_name(taxon,string80) ; taxon_name:standard_name = "biological_taxon_name" ; char taxon_lsid(taxon,string80) ; taxon_lsid:standard_name = "biological_taxon_lsid" ; data: time = // 100 values ; abundance = // 200 values ; taxonname = "Prochlorococcus"; "Can we include the other 3 taxon_name here?_"" taxon_lsid = "urn:lsid:marinespecies.org:taxname:345515"; "_Can we include the other 3 taxonlsid here?"

roy-lowry commented 3 years ago

@fcarvalhopacheco I think you've got it!! You can certainly add three more taxa as you suggest - even 30 or 300 more taxa, preventing a massive propagation of new Standard Names that I feared would become unsustainable..

roy-lowry commented 3 years ago

One minor point - the name for 160572 should be just Synechococcus (it's the Genus - the Nägeli is part of the name reference for the taxon, not part of the Genus name.

fcarvalhopacheco commented 3 years ago

@roy-lowry. Thanks! thats great. I will pass this information along

MathewBiddle commented 3 years ago

Back to the original question posted above. Which term should we be using for files we are generating now?

This is what we have right now, which will pass CF checkers but is not aligned with the guidance:

    string taxon_lsid(obs) ;
        taxon_lsid:standard_name = "biological_taxon_identifier" ;
        taxon_lsid:long_name = "Namespaced Taxon Identifier" ;
        taxon_lsid:source = "WoRMS (2021). Halichoerus grypus (Fabricius, 1791). Accessed at: http://www.marinespecies.org/aphia.php?p=taxdetails&id=137080 on 2021-04-30" ;
        taxon_lsid:url = "http://www.marinespecies.org/aphia.php?p=taxdetails&id=137080" ;

\\ global attributes:
        :standard_name_vocabulary = "CF Standard Name Table v77" ;

data:

 taxon_lsid = "urn:lsid:marinespecies.org:taxname:137080",

Updated to include data.

roy-lowry commented 3 years ago

The slightly embarrassing answer is biological_taxon_lsid. However, this will fail compliance checkers because the defect correction specified above last November still hasn't been actioned. I did issue an e-mail reminder and was promised it would be in the next Standard Name update which I think has has passed. However, I've just checked and nothing has changed.

@japamment Could we please get this defect corrected?

MathewBiddle commented 3 years ago

@roy-lowry @japamment Do you know if this will be an adjustment to the existing tables (v71 - v77), or will we have to wait until v78 is released?

roy-lowry commented 3 years ago

@japamment @feggleton @davidhassell This is an e-mail I received on this issue in January Hi Roy,

Happy New Year to you too, it’s good to hear from you.

Thanks for drawing my attention to this one and apologies for missing it – Fran and I went through all the open standard names issues in the discuss repo on Monday to see which ones could be finalised, but I must admit we didn’t do the same with the conventions repo. I’ll pick this one up from the existing ticket (no need to start a new one) and make sure it gets progressed in time for the next update (i.e. not next week I’m afraid, as I don’t want to add new content after announcing it, but the next update in Feb/March). Hope that’s okay.

Actually this is a very useful reminder, as I know there are some other standard name related conventions issues that need tidying up, so it would be good to try and resolve those over the next few weeks.

Cheers, Alison

Nothing has happened. I have e-mailed several times since to ask about progress, but received no responses making me wonder if my e-mails were falling foul of a spam filter. Consequently, I'm trying a comment here as an alternative form of communication.

japamment commented 3 years ago

@roy-lowry my apologies and thank you for keeping this one on the radar. This ticket has now been actioned and biological_taxon_identifier will be turned into an alias of biological_taxon_lsid in the next standard names update. I have copied the syntax of the urn from an earlier post by @davidhassell - please can you check the CEDA editor to ensure the definition text contains the correct urn?

I have also updated the definitions of the other 'taxon' names to refer to biological_taxon_lsid.

roy-lowry commented 3 years ago

@japamment Many, many thanks. Yes, David correctly fixed my attempt using unescaped chevrons so what you have in the CEDA editor is correct..