geolexica / isotc211.geolexica.org

ISO/TC 211 online version of the Multi-Lingual Glossary of Terms
https://isotc211.geolexica.org
4 stars 2 forks source link

Creating linkages between concept content and related concepts #88

Open ronaldtse opened 5 years ago

ronaldtse commented 5 years ago

From @dr-shorthair :

https://www.geolexica.org/concepts/12/ For example, each of Note 1 and Note 2 refer to other concepts from the lexicon. These should be hyperlinks. Missed opportunity …

This is something that we hope to improve on but due to the TC 211 MLGT input being an Excel file, it can be cumbersome to create direct linkages through inference of text.

For example, suppose we have the concepts “coordinate” and “coordinate reference system”. During automatic parsing of the content, presented with “coordinate reference system” we cannot be sure whether the usage of “coordinate” is of the former or the latter.

We will need to evolve out of the Excel file to do something like this. This sort of problem also applies to handling math; Excel can’t cut it.

Any suggestions?

dr-shorthair commented 5 years ago

I definitely agree that Excel is not capable of serving as the point-of-truth for all this. My preference would be to move to a semantic platform - start with SKOS which allows for skos:related and sub-properties. You'll probably find you need to define further sub-properties in due course.

But the initial transformation is likely to be painful. I had an initial go at it with Andrew Jones about 3 years ago, but didn't have funding to pursue it properly. There are some Excel-->RDF and CSV-->RDF pipelines available to get things started, but I'm sure there would be a big manual cleanup involved as well. Might be a good student project somewhere?

ronaldtse commented 5 years ago

@dr-shorthair Geolexica already transforms all the Excel data into a "term YAML" format; it's just not displayed or served under Geolexica because of the TMG's fear that someone will import that file.

It is now super easy to generate SKOS from the term YAML file (that is, as long as the TMG agrees that it's okay for people to bulk download the data).

In fact, Reese and I already cleaned up as much as we could of the MLGT/terminology repository (the source data) during the first import. Machine-readability is already a solved problem, what's remaining here is policy... :wink: