cf-convention / discuss

A forum for proposing standard names; and any discussion about interpretation, clarification, and proposals for changes or extensions to the CF conventions.
42 stars 6 forks source link

Publish CF Standard Names as an Ontology #51

Open alexrobin opened 4 years ago

alexrobin commented 4 years ago

I noticed that several groups have started publishing CF Standard Names in ontology form (e.g. https://mmisw.org/ont/cf/parameter) so that it can be used in semantically enabled applications.

The issue I see is that all of them have done it under their own domain name but none of them are the authority for this vocabulary.

I think it would be very beneficial if the CF conventions working group could publish the official version of the ontology on its public website http://cf-conventions.org so that everybody can point to it rather than duplicating it.

This ontology could be rooted on the official domain prefix "http://cf-conventions.org/ont" and published everytime the CF Standard Names Table is updated. It can be auto-generated from the XML version of the table.

What is important I think is that the ontology file (for example RDF/XML or Turtle format) gets hosted on the official website, along with the other formats (HTML, XML, etc.).

I am willing to initiate the work and even update the ontology as required. Anybody interested?

roy-lowry commented 4 years ago

A similar issue to this was first mooted by Bryan Lawrence amongst others at the GO-ESSP meeting (as CF governance meetings were formerly known) in Seattle which I think was in 2008. The problem then was (and still is) the requirement to provide Standard Names with unique URIs under the CF domain. This never happened and each Standard Name currently has (at least) three URIs in two different domains each with its own RDF representation.

I like the idea of an ontology file on the CF website as a tool could be created to rebuild the file (much easier than maintaining the ontology file) each time a new version of the Standard Name table is produced. Alison and Francesca would need to be consulted as to whether such an ontology update could be fitted into their workflow, but I don't see it requiring much work.

A word of warning about processing the XML version of the Table. CF working practice has been to make changes to Standard Names through the process of aliasing. At one time there was a correction where mole_fraction_of_chlorine dioxide_in_air (note the space after chlorine) was replaced by mole_fraction_of_chlorine_dioxide_in_air, but the original was left in the XML. The embedded space caused the XML to be not well-formed and broke XML-handling tools when people in the past have tried to manipulate it. I'm not sure whether this has been addressed but it might be worth checking it out sooner rather than later.

If there are issues with the XML NVS (one of the domains serving Standard Names) includes a SPARQL endpoint that could possibly be considered as a source.

alexrobin commented 4 years ago

Yes, I think the most important thing is to use unique URIs rooted on the CF-conventions domain and to distribute the ontology file on the official website. This way there would be no ambiguity as to who is the official source.

No need to setup an ontology server of any kind, a simple ontology file to download will suffice as anybody can download and ingest it in its own ontology tool.

And as you said, no need to maintain the ontology separately either. It should be pretty straight forward to integrate the automatic generation of the Turtle file in the existing workflow, using the XML file as the source. If needed, we can work out any issue in the XML caused by aliasing.

roy-lowry commented 4 years ago

A difficult job would be getting the CF namespace URIs established outside your ontology. NVS and MMI URIs have been out in the wild for well over a decade. I know from feedback to change that people have systems critically dependant on the NVS URI syntax. Possibly there is also dependence on the RDF documents to which the NVS URLs resolve. I would be very surprised if there weren't similar dependencies on the MMI URIs. This usage of vocabulary server URIs is a very different use case to the user of a stand-alone ontology in an ontology tool.

However, I see no problem with addressing the stand-alone ontology use case initially before thinking about the thorny issue of how to bring the vocabulary server use case into line.

alexrobin commented 4 years ago

I'm not thinking only about the stand-alone ontology use case. I very much want to publish the CF ontology on an ontology server myself, but I'd rather use CF official URIs, even though I'll rehost it on my own domain.

What is true is that we may not be able to resolve the CF ontology URLs directly in a browser unless you're willing to setup something on the cf-conventions domain for this but this is of secondary importance.

I suspect if you start publishing something on the official CF website, projects will slowly start using it rather than the alternative URIs. In the meantime, mappings can be created to state the equivalence with existing MMI or NVS URIs (in fact this is already done today between MMI and NVS implementations)

roy-lowry commented 4 years ago

Mapping did occur to me as a solution providing somebody is prepared to maintain it operationally. I certainly won't be setting anything up anywhere - I'm retired with no resources other than e-mail. So, please proceed with my support. I will watch developments with interest and comment if I feel it appropriate.

neumannd commented 4 years ago

I would be interested in a proper CF Standard Name Ontology. We (as a repository) recently discussed improving the FAIRness of our data. Having a community-standard of standard names as ontology would improve our representation of stored data/variables. My technical experience with respect to creating ontologies is non-existing but I would be willing to learn and contribute.

graybeal commented 4 years ago

@alexrobin is largely right, the other names will take care of themselves over time once CF Standard Names is publishing with IRIs in its own namespace. There will be an awkward interim period, where there are both old and new publications of the ontology; the old one supporting existing users and links, the new one with proper IRIs. Conceivably, some users would want to keep updating the old formats, and continuing to update the "rehosted" versions (properly attributed/referencing with updated metadata pointing to the primary) has some merit. (For example, having the historical records to enable comparisons across the versions is pretty handy.)

In any case, getting it properly hosted can be pretty easy for the CF community. It does not require setting up an ontology repository, nor maintaining the ontology within CF-owned servers; only an appropriate IRI and some redirection settings are needed within the CF ecosystem. A relevant example is SWEET: SWEET content is available and resolvable at https://sweetontology.org, but through the magic of redirection, the ontology is served by ESIP's Community Ontology Repository (at http://cor.esipfed.org/ont?iri=http://sweetontology.net). And there isn't any "extra" work to submit changes to the repository—the ontology is maintained in GitHub, then pulled in by COR whenever the GitHub release is updated (much as CF is already pulled in by MMI ORR, in fact).

(How it works: IRIs like http://sweetontology.net/stateEnergyFlux or http://sweetontology.net/stateEnergyFlux#EnergyFlux_Wm2 get redirected to pyLODE representations of those resources (that's a recent update); another option for redirection is to the COR interface itself, which supports additional features like accessing previous versions of the ontology. The instructions for doing this are at https://github.com/ESIPFed/sweet/wiki/sweetontology.net, and they are pretty straightforward.)

Credit to Carlos Rueda, Lewis McGibbney, and Nicholas J. Car for their efforts bringing these technologies together, and of course a big shout out to the ESIP Federation and their team (thanks Annie!) for their hosting of the Community Ontology Repository.

Go for it!

graybeal commented 4 years ago

No need to setup an ontology server of any kind, a simple ontology file to download will suffice as anybody can download and ingest it in its own ontology tool.

The reason I promoted the above approach is that I think it offers a better value to CF users, and a much better user experience, within the CF namespace. It offers a set of familiar and useful semantic views in addition to the "plain OWL" file, as well as the additional services of the repository host for things like downloads in other formats. Most importantly to me, it makes the CF terms self-resolve in a really nice way. (From experience I am convinced that trying to resolve a semantic IRI and getting an OWL download is extremely disappointing for ontology users, and often useless for what I am trying to do, which is immediately understand the term.)

But CF does have a fair number of existing access points, so if the community says "eh, we'll just let the community do whatever it wants for re-hosting and then our users can find those other services on their own", that's a choice. You can always add the user-friendly approach later.

alexrobin commented 4 years ago

@graybeal I totally agree with you. I was thinking of publishing the ontology file as a first step as I didn't know if setting up a redirection from the cf domain was feasible in the short term. But I definitely think the redirection would add great value and I'm sure ESIP would be ok to host it on COR.

DocOtak commented 4 years ago

Isn't the NERC P07 Collection kept in lockstep with the standard names xml file? IIRC every time I see @feggleton (or @japamment) send a message about a new standard name table publish, it always includes a statement about the NERC vocab server (see https://github.com/cf-convention/discuss/issues/13).

roy-lowry commented 4 years ago

@DocOtak The NVS P07 collection is an integral part of the Standard Name update workflow and publishes new versions simultaneously with the XML and text files on the CF site, including an automatically maintained mapping to the MMI ontology. I understand the latter is brought into line when the version publication announcement is made.

DocOtak commented 4 years ago

@roy-lowry Since P07 is so integrated with the process already why not sanction it as the place to look for folks who want an ontological view of the standard names? What the NVS serves when you do an http GET like the browsers do is already RDF (the stuff you see in the browser is due to a fancy stylesheet).

roy-lowry commented 4 years ago

@DocOtak Thanks for the advert. I know what NVS delivers as I conceived it back it 2004 and ran the team developing it for 14 years until I retired. The answer to your question is that NVS has been sanctioned in the way you suggest in the past - from memory around 2008-2009. There was even a plan to bring the P07 collection into the CF namespace that failed for reasons that I won't go into here. However, time passes and people forget. I was mildly amused by the initial posting in this thread describing the MMI ontology publication as something new when it has been in the public domain for at least a decade. The current team continue to develop NVS both technically and in terms of content and put a lot of effort into its promotion, especially in Europe and in fora like Research Data Alliance. It does what it does very well. In particular as the URIs resolve to RDF including mappings it is an excellent engine for Linked Data semantic web applications. However, it doesn't include all the features of a full-blown ontology which is why I support Alex's idea for what I see as a parallel resource.

alexrobin commented 4 years ago

@roy-lowry I didn't mean to imply the MMI implementation is new (I was actually working with them back then ;-) but I guess I was surprised that, to this day, nothing comes from the CF working group directly.

BTW I wrote a small Python script that can be used to convert the XML to Turtle RDF, with proper mappings to the existing MMI and NERC vocabularies. I will send it as well as the generated file in a PR soon.

ethanrd commented 4 years ago

Hi all - I support the idea of the CF website hosting an official version of RDF/Turtle files for CF Standard Names. The CF Standard Names are already published on the website as XML, HTML, and KWIC Index for each version. They are currently hosted at /Data/cf-standard-names/{version#}/[build|src]/. It might be nice to have a simpler URL for the RDF/Turle files (@alexrobin suggested /ont/).

I’m guessing that adding RDF/Turtle files probably wouldn't be a huge leap. @japamment and @feggleton - Any thoughts on how this might fit into the current CF Standard Name build process?

(PS A few folks on the CF info-mgmt team have been trying to clean up and document the website and document build process. I’m planning a PR to the website repo with a new “Infrastructure Guide” page to get that started. I’ll be in touch on that PR -- @japamment and @feggleton -- to ask about the process for CF Standard Names.)

alexrobin commented 4 years ago

FYI, I submitted PR cf-convention/cf-convention.github.io#105 that includes a script to generate the RDF/Turtle file from the original XML, as well as the resulting RDF/Turtle file for version 72.

roy-lowry commented 4 years ago

@alexrobin I was a part of MMI before you in the days of Luis Bermudez - I think around 2003-2007. I think the answer to your wondering is that CF is a community standard run by voluntary contributions which means that there is a limit to what can be undertaken. Considering these limitations an incredible amount has been achieved over the past few years such as the migration of the Conventions document and discussions onto GitHub.

alexrobin commented 4 years ago

@roy-lowry Didn't mean to say the CF community is not doing enough ;-) I know everybody has plenty on their plate, but just thought there was an opportunity to bring the ontology back to where it belongs. I hope my small contribution can help with that.

BTW I was never a part of MMI but just working with them on a few projects like XDomes.

alexrobin commented 4 years ago

Please let me know what you think about PR cf-convention/cf-convention.github.io#105 and in particular the resulting RDF/Turtle file for version 72.

And also let me know if the community needs any more help integrating this into the existing workflow. I can make improvements/adjustments to the script as needed.

DocOtak commented 4 years ago

@alexrobin I think we are waiting for @japamment and @feggleton to chime in on this since they are the ones who update the standard name table/do the publishing of new versions.

@roy-lowry Examining the RDF from the NVS it appears to be really close to what @alexrobin seems to have created (though different formats), but its missing the "canonical name", I was wondering why that doesn't appear in the NVS xml.

ethanrd commented 4 years ago

I also think we should look at redirection to an existing or new ontology server (as @graybeal and @DocOtak mention). I expect they have more capabilities than if we served static RDF files directly on the CF site. NVS P07 is already maintained and kept in lock step with CF standard names. I expect the ESIP COR would need some input file(s). (Perhaps the static RDF files in @alexrobin's PR?)

@roy-lowry Does the NVS P07 collection support all CF Standard Name versions? I thought it did but when I was poking around the site I kept getting redirected to "current".

Also wondering if folks from the netCDF-LinkedData project have any thoughts on the matter - @marqh, @jyucsiro, @adamml, and others.

graybeal commented 4 years ago

nice contribution Alex.

what is the process by which the versionInfo gets updated in the Turtle, is that automatic from metadata from the CF project?

I'd strongly prefer/recommend '/' as the final delimiter rather than '#'—for conformance with W3C standards it can be nicer to avoid #, which 'by rule' would force the whole ontology to be served every time a term is accessed. Most major repos choose / given the option. (I hope my memory is being faithful here, but it's late.

I am imagining CF wouldn't try to keep all the old versions of the ontology—it already has all the old versions of the canonical data, and people can go elsewhere for those. (There's a dialog going on in the OBO community about using versionIRI as the latest OWL 2 concept, that would provide identity and maybe access to each versioned ontology. That's real work, I don't think you have to do that for CF right now.)

Incidentally, this may be moot given Roy's thoughtful response, but given the parallel play of MMI and NVS over the years, I don't think it's a good idea to designate a repo 'authoritative' that isn't serving from the CF namespace. (It creates a dependency on the external service, and on the services that service provides.) If CF serves the ontology with CF namespace terms, that's by definition authoritative, unambiguous, and will always be in the control of CF.

Even if your initiative doesn't get taken up right away Alex, I think having the code available is a plus for CF. When resources allow it will give CF a head start.

alexrobin commented 4 years ago

Thanks @graybeal.

The versionInfo is automatically read from the source XML file, and '/' is used as the final delimiter in the current version. So a complete standard name's IRI would look like http://cfconventions.org/ont/parameter/age_of_sea_ice.

@ethanrd I have tested loading the file into ORR w/o issue so it should be just as easy to load it into ESIP COR. However, I'm not sure how to handle the redirect from the CF domain at the moment since the cfconventions.org website is hosted on GitHub Pages.

adamml commented 4 years ago

Hi all

Thanks @ethanrd for looping me in the conversation.

Probably six or seven years ago, @Marqh, @roy-lowry and I discussed doing this from the NERC Vocabulary Server. Our road map looked something like

A (little-known) feature that I implemented back before Jabuary 2015 (I.e. while I was at BODC) on NVS is that a standard name can be accessed via this routing already, as per step 1 of that roadmap, e.g.

http://vocab.nerc.ac.uk/standard_name

http://vocab.nerc.ac.uk/standard_name/age_of_sea_ice

I guess steps 2 and 3 never happened.

The RDF could possibly be "decorated" a little more now, given that it's a long time since this was done initially and CF is possibly described ontologically now.

Also looping @alko-k

japamment commented 4 years ago

Hi All,

I've been away from work for a while for family reasons, so apologies for my recent silence. I'm just starting to catch up on a whole bunch of GitHub discussions.

Just for the record let me explain how we currently generate versions of the standard name table and how it gets into NVS.

The standard name vocabulary on NVS is updated at the same time that Francesca or I update the standard name table on the CF website. We achieve this using the CEDA vocabulary editor. This tool helps us to keep track of current (and past) discussions of standard name proposals. It also allows us to flag terms that have been agreed and are ready for publication. When we generate a new vocabulary version the editor produces an XML file that conforms to the schema used for the standard name table (see Appendix B of the CF conventions). An XSLT script is then used to render the XML into HTML. These two files are uploaded to GitHub which in turn allows us to publish them to the CF website. The CEDA vocabulary editor also produces a tab separated file suitable for uploading to the NVS2 editor. This allows us to send identical information to the CF site and NVS2 in an automated way. The terms sent to NVS2 are run through some checking scripts and then published overnight (many thanks are due to our BODC colleagues, especially Gwen Moncoiffe and Alexandra Kokkinaki for this part of the process).

The KWIC Index version of standard names is produced by first running a Unix shell script to do some simple text editing on the XML file, then feeding the result into a Prolog program contributed by Robert Meutzelfeldt from the University of Edinburgh. (This is done manually and is not part of the CEDA vocab editor). I confess the Prolog is something of a black box to me, but the program only takes a couple of minutes to run and generates the KWIC Index as an HTML file.

On NVS2, the URL http://vocab.nerc.ac.uk/collection/P07/current/ will return the whole list of standard names from the latest version (equivalent to the information in cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html and http://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml).

An individual term in the current version can be accessed by adding it's key to the URL, e.g, http://vocab.nerc.ac.uk/collection/P07/current/CFSN0335/ will get you the latest version of the term for sea_water_temperature.

Of course, you may not know the opaque key for your favourite standard name, so there is another form of URL that uses the name itself: http://vocab.nerc.ac.uk/standard_name/sea_water_temperature/ which again will resolve to the current version of the term. The vocab server does this by mapping the two URLs to one another as a skos exactMatch.

@ethanrd asks if NVS2 supports all CF standard name versions. I played with this a bit because I expected it to, and got some slightly surprising results. Listings of older versions of the whole vocabulary worked for me, for example, http://vocab.nerc.ac.uk/collection/P07/55/ will list all the terms in version 55 of the standard name table. The oldest version I managed to list was 18 - attempts to list earlier ones just produced the header, but no terms. Version 18 is dated 22 July 2011. I'm wondering if this was around the time NVS2 replaced NVS1 and maybe the URLs are different for older versions - I'll check with my BODC colleagues about this.

I also found that searching for an individual term in a numbered version of the vocabulary doesn't work quite as I'd expect. For example, http://vocab.nerc.ac.uk/collection/P07/55/CFSN0335/ doesn't resolve even though CFS0335 did exist at V55 of standard names. I think this behaviour results from the versioning of individual terms and how they are associated with vocabulary versions. If you list the whole of version 55 then use your browser search bar to look for CFSN0335 it finds the link http://vocab.nerc.ac.uk/collection/P07/current/CFSN0335/. Version 55 of the standard name table does contain the most up to date, i.e. current, version of that particular term, so this link gets you to the right information, but maybe not by the most obvious route! Again, I'll check about this - maybe it wouldn't be difficult to get numbered versions of individual terms to resolve.

Regarding ontology files/ ontology servers this isn't my area of expertise as most of my time is spent thinking about the content of standard names and their descriptions, rather than how people go about consuming them, but it's an area I'd like to learn more about.

alexrobin commented 4 years ago

@japamment Thanks for the explanations.

FYI I'm not sure that's the correct way to do it but I made an attempt at integrating the python script I wrote that generates the RDF/Turtle file into the existing build process (see proposed changes to the makefile in the PR)

roy-lowry commented 4 years ago

@adamml The CF community has been informed on several occasions of the availability of the URL set you set up but nobody ever took up integrating it into the CF namespace. So 'step 2' happened but not 'step 3'.

@ethanrd I see @japamment has taken this up and will be contacting the current NVS team for clarifications so I won't muddy the waters by responding from a fading memory!

@DocOtak I'm afraid I don't know. The details of the NVS RDF document were the work of others - Adam Leadbetter when he was at BODC and subsequently Alexandra Kokkinaki with significant input from Simon Cox. I just let them do their thing as my technical expertise is founded in FORTRAN and SQL.

adamml commented 4 years ago

@roy-lowry Thanks for the clarification on that...

@DocOtak On the /P07/ route the RDF/XML payload on the NERC Vocabulary Server was designed to be as close to "pure" SKOS as as possible - with a few other decorations for specific things. When we put the payloads together it would have been fairly complicated to add specific predicates for specific vocabularies - indeed it wasn't a use case we had.

The /standard_name/ route on the NERC Vocabulary Server works slightly differently (although it's a long-time since I put it together now and I no longer have direct involvement in that system having moved on from BODC) however, I have a feeling that it would be easier to "decorate" RDF/XML payload there with the additional CF explicit terms than at /P07/.

I think having the pieces of CF (standard_name, canonical_unit) defined such that they could be referenced from NVS or COR/ORR as predicates would be really helpful, but again I thought @marqh has done a lot of this already (but could be mistaken).

ethanrd commented 4 years ago

@japamment - Thanks for the explanation of how the standard name tables are generated.

Currently, the CF Website repo contains some of the build files you mention (XSD, XSLT, python, makefiles, and such) in each standard name version subdirectory. I don't believe they are needed in the website so it might be good to clean those up and not push them to the repo for future versions. I'll start another issue focused on that so we have a place to discuss how to proceed without cluttering up this issue.

ngalbraith commented 4 years ago

@neumannd

discussed improving the FAIRness of our data. Having a community-standard of standard names as ontology would improve our representation of stored data/variables.

I'm also facing the issue of 'FAIRness', and I'm wondering how having the standard names in ontology form, vs having them available on the NVS system, improves this. The FAIR documents seem so non-specific, I find them hard to actually apply. I'd love some practical guidance on this (although, with apologies, I realize it may be off topic for this issue).

roy-lowry commented 4 years ago

Hi Nan,

I share your wonders about how the Standard Name delivery format would influence data FAIRness. To me the critical thing is having the terminology formally managed.

Cheers, Roy.

adamml commented 4 years ago

From my comment above:

I think having the pieces of CF (standard_name, canonical_unit) defined such that they could be referenced from NVS or COR/ORR as predicates would be really helpful, but again I thought @marqh has done a lot of this already (but could be mistaken).

In speaking with @jyucsiro this morning, we remembered that @marqh had worked on describing a vocabulary of CF Terms used in netCDF files at:

http://def.scitools.org.uk/CFTerms

Mark would need to comment on the stability of the underlying RDF and the namespace.

This seems to have done some of the predicate publishing that @alexrobin is proposing.

In terms of an "ontology" for the Standard Names promoting FAIRness (and acknowledging that I'd be opening another old and different can of worms here, think Common Concepts...) I think that the more important thing than which namespace the Standard Names are served from would be to formalise the grammar of the Standard Names into a semantic mode (e.g. the Complex Properties model, or engaging with the RDA group on Interoperable Description of Observable Properties).

Taking the "age of sea ice" example:

There's a more complete example here.

dr-shorthair commented 4 years ago

Joining late, just a couple of points;

  1. Persistent URIs should not have version numbers embedded. High up in this thread I think some URIs did have version info in the path. And short path URIs are preferable to long ones. Don't embed semantics in the path.

  2. HTTP Content Negotiation by Profile allows different representations according to various models as well as serializations to be delivered from a single URI - see https://profilenegotiation.github.io/I-D-Profile-Negotiation/I-D-Profile-Negotiation and https://www.w3.org/TR/dx-prof-conneg/

alko-k commented 4 years ago

Joining late as well. Thanks @adamml and @roy-lowry for including me. I wanted to clarify the versioning issue that @japamment brought up:

NVS implements versioning on the concept level following this URI pattern: http://vocab.nerc.ac.uk/collection/P07/current/BBAD2101/VNumber/ e.g. http://vocab.nerc.ac.uk/collection/P07/current/BBAD2101/1/ http://vocab.nerc.ac.uk/collection/P07/current/BBAD2101/2/

The URI that is published by default, is the latest one (withough the version number in the end) e.g http://vocab.nerc.ac.uk/collection/P07/current/BBAD2101/

So http://vocab.nerc.ac.uk/collection/P07/current/BBAD2101/ is the same as http://vocab.nerc.ac.uk/collection/P07/current/BBAD2101/2/

and uses the PAV ontology properties to describe the versioning. and provides access to the current and previous versions of a concept.

Content negotiation is in place as you all already mentioned that currently resolves to RDF/XML. We are planning soon to publish in different serializations as well.

roy-lowry commented 4 years ago

@dr-shorthair The embedding of version numbers into the URL syntax was one of several design errors in version 1 of NVS fixed in version 2. The 'current' in the version 2 NVS URL syntax is a dummy field that was included for backward compatability with NVS version 1 URLs.

One clarification to @alko-k 's posting is that the concept version numbers used in NVS version 2 are NOT synchronised in any way with the CF Standard Name Table version numbers. The NVS 1 versioning was an attempt to do that and whilst a nice idea the costs in both design compromise and implementation (list rebuilding from audit trails brought a powerful server to its knees) were too high.

The NVS version 2 concept version numbers start at one and then increase by one each time one of the concept fields is changed. Because of the way Standard Names are managed (preferred label changes are implemented by the creation of an additional concept) the multiple versions of a single concept, such as Alexandra's example, are related to changes in the description field.

graybeal commented 4 years ago

@ethanrd I have tested loading the file into ORR w/o issue so it should be just as easy to load it into ESIP COR. However, I'm not sure how to handle the redirect from the CF domain at the moment since the cfconventions.org website is hosted on GitHub Pages.

Sorry, missed this earlier. That complicates things, but only somewhat. Somewhere in the cfconventions.org DNS record there is a record—I think an A record, but maybe another kind—that says the domain is served by a github repo. I think there are several different ways to do this, but my DNS-fu is a bit rusty. I think the best way may be to have multiple listings in that DNS record, one to serve most cfconventions.org content, the other serves cfconventions.org/ont content. I am 87% sure this is viable ;-) If that isn't viable, and everything in cfconventions.org is served directly by GitHub, I'm at a loss—the usual "let apache/nginx handle it" doesn't seem to fit.

@alexrobin I think we are waiting for @japamment and @feggleton to chime in on this since they are the ones who update the standard name table/do the publishing of new versions.

And the rest of the CF convention management, likely?

graybeal commented 4 years ago

I'm also facing the issue of 'FAIRness', and I'm wondering how having the standard names in ontology form, vs having them available on the NVS system, improves this.

The main improvement that I see is that it consolidates references to these concepts under a single identifier (eventually), and that identifier is likely to be resolvable. A person might be able to infer the different IRIs generated by different repositories (NVS, ORR, COR, BioPortal, …) are the same; a computer could only know if it was specifically told. So almost all the computers dealing with datasets that list one of these "unique IRIs" would not make the association with the datasets that contain the other "unique IRIs", even though the two IRIs are referring to (and mapped to) the same concepts. That's not very interoperable.

I'm sorry to say that all the excellent things CF is doing to publish their content doesn't address this interoperability need for the semantic and linked data communities. (Biomedical repos have the same issue with a few UMLS vocabularies, it's not just them!) There is some whiff of discussion about all of us repos agreeing on common identifier schems in such cases, but it is politically hard to engineer the ideal solution.

The FAIR documents seem so non-specific, I find them hard to actually apply. I'd love some practical guidance on this (although, with apologies, I realize it may be off topic for this issue).

GO FAIR, FAIRsFAIR and RDA and/or ESIP are working on this goal, in different ways naturally; others too. Sometimes with practical guidance and sometimes with evaluator software.

Persistent URIs should not have version numbers embedded. High up in this thread I think some URIs did have version info in the path. And short path URIs are preferable to long ones. Don't embed semantics in the path.

I beg to differ, but only in a specific respect (which I think is not what @dr-shorthair was addressing here).

If we grant that a concept definition associated with a particular (non-versioned) IRI can change, even if the concept itself does not—to fix typos, to be clearer, to add some newly realized characteristic that is fundamental to the concept—then we must also give users the ability to specify "the concept as it was defined when I got it". So we need an identifier to represent that concept—it is persistent, but not in the same way as, say, the SWEET IRI for a concept like water.

It is true that we could eliminate the semantics of that identifier within the IRI, and have the concept-as-of-versionX identified as a code. By doing so, we would no longer serve several use cases for usability or persistence, and IMHO would not actually gain any interoperability or semantic advantages. (Because the argument for removing the semantics is that the semantics might change. I don't think the semantics of the identifier will change in this case; the IRI issued will always be the right IRI for that versioned concept, even if it can't be resolved under that IRI any more.)

This potentially leads to a longer discussion, but not one that determines whether or not CF is published as an ontology.

JonathanGregory commented 4 years ago

With apologies for my ignorance of how vocabularies and ontologies are defined and organised, I'd like to comment on this

Taking the "age of sea ice" example:

  • Age is the thing being observed (The "Property" in the Complex Properties model)
  • Sea ice is the matrix

I don't think that CF standard_names can be decomposed in that way, for a number of reasons, which I tried to describe in my presentation to the CF meeting in Reading, http://www.met.rdg.ac.uk/~jonathan/talks/CF180620.pdf, on slide 5. However, it would be possible to assign bundles of semantic units to standard_names, as part of their definition in the standard_name table, to indicate which properties (such as the concept of age, or the medium of sea ice) are relevant to each one. Maybe that sounds the same! What I mean is, you can't analyse a standard name to find out what it's made of, but you can provide information about the ingredients used to synthesise it. It's a one-way translation.

huard commented 4 years ago

Should the CF ontology be built on existing standard ontologies (e.g. PROV-O) to improve interoperability ?

I think this could mesh with https://github.com/cf-convention/discuss/issues/33

neumannd commented 4 years ago

@ngalbraith @graybeal : We are testing a FAIRness metric which the FAIRsFAIR project is developing. Base on that metric we try to improve our FAIRness. Without having a clear metric, we would not work on this topic (as you write: the FAIR principles themselves are quite non-specific). Even if a current/first CF ontology does not allow a semantic description of CF standard names (as @adamml points out), it is still an advance.

However, it would be possible to assign bundles of semantic units to standard_names, as part of their definition in the standard_name table, to indicate which properties (such as the concept of age, or the medium of sea ice) are relevant to each one. Maybe that sounds the same! What I mean is, you can't analyse a standard name to find out what it's made of, but you can provide information about the ingredients used to synthesise it. It's a one-way translation.

@JonathanGregory : That would be great. Wouldn't automatic extraction of certain keywords be possible to enrich the ontology? Such as: "in_sea_water", "nitrogen", "nitrogen_dioxide", ... .

graybeal commented 4 years ago

I think the discussion of tagging is fine, and the technical work has effectively has been done in prior analyses, so can be re-done with little pain. (There is a page that refers to these analyses somewhere on the CF site.) But those topics don't address or impact the value of the original request. That request was a simple request with significant value to semantic technology users, and its goals are not affected by whether CF standard names are decomposed or tagged (two different things, as described above).

Each standard name exists as a conceptual unit. What we would like is to have each conceptual unit recognized by a single globally unique identifier. (@alexrobin While you're at it, can we do all the other CF controlled vocabularies too? Oops, splitting the thread, ttyl!) This isn't hard, it isn't complicated (once a namespace is agreed on :->), and I don't think it should be controversial. Looking back quickly over the comments, I don't think it is controversial, it just raises interesting corollary topics.

Might I suggest, as a way forward, that we focus on identifying any concerns about the goal of establishing such on ontology? If there are no concerns about the fundamental goal, then a useful next step would be for Alex, perhaps consulting with a few volunteers, to assemble a straw proposal implementation of such an ontology, and those particulars can be discussed.

I believe that even whether/how CF would host the ontology can be deferred as an "implementation detail"—as long as CF supports its existence, we can go forward with implementation planning with confidence that the ontology can become the long-term standard once its implemented form is agreed upon.

If this is agreeable, the other topics raised here (decomposing, tagging, common concepts, other CF controlled vocabularies, etc.) can be discussed in separate tickets, where their value proposition can be more fully explored.

roy-lowry commented 4 years ago

Hi John,

There was a discussion on the subject of the ontology and getting Standard Names into CF namespace during the breakout session in the CF meeting last week. There are complications due to the way in which the CF namespace has been integrated into GitHub but @ethanrd has an idea how to get around this and it was agreed that a virtual meeting would be organised to work out a solution and take it forward. I think @japamment was going to call this.

rob-metalinkage commented 3 months ago

Plans to resolve this?

Copy of #217 comment..

Any progress on a resolution to this? There is a STAC JSON schema - https://github.com/stac-extensions/cf

We can connect this to an ontology using JSON-LD - if we had one :-)

This is part of a GeoDCAT requirements and development process where we can build profiles to match community standards:

https://ogcincubator.github.io/geodcat-ogcapi-records/

early days - but open to collaboration. Communities can engage to test, refine and promote these resources into formal standards as part of the OGC and ISO processes if appropriate.