ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
116 stars 33 forks source link

Populate all sweet prefixes throughout ontology suite #163

Closed lewismc closed 4 years ago

lewismc commented 4 years ago

The SWEET Prefixes generated as part of #155 need to be populated throughout the entire ontology suite. An example of what this looks like: Currently the prefixes for both https://github.com/ESIPFed/sweet/blob/master/src/human.ttl#L6 and https://github.com/ESIPFed/sweet/blob/master/src/human.ttl#L8 should be sorep and sohu respectively.

lewismc commented 4 years ago

This is a pretty significant task but it is one which standardizes the way people will use SWEET prefixes.

graybeal commented 4 years ago

Do you want just the prefixes that are already used in a given ontology to be included? I'm thinking yes. (Not all 200+ prefixes in every ontology.)

Do you want any actual usages of the full ontology names to be replaced inline with the prefixed name?

I think this is doable without much pain. I'm not volunteering at this moment, but it is a project that interests me, so it could happen.

brandonnodnarb commented 4 years ago

I looked into this briefly after the SHACL ns mapping exercicse (and @dr-shorthair's comment)

I assumed I could construct a glorified sed operation (via a Python script). But first I thought I might try and load sweetAll.ttl in to an editor and change all the namespace declarations (or a few initially to test), save it as new to see if the editor rewrites the files (as one would expect). I had some issues which, at the time, I thought were due to my Protege installation. However, using TBCFE (6.0.1), a namespace collision report (or some similar thing) was generated listing internal ns conflicts.

I did a fetch all and rebase to confirm this wasn't just an issue with my local copy.

I have attached/uploaded a zip file containing prettyfied tsv files for prefix collisions as well as ns collisions.

This seems to be a fairly recent development. Despite the errors, sweetAll.ttl used to load in Protege and now it just hangs (and generates a 1.6Gb log file). Perhaps reverting to a previous instance is in order?

Might this be related to #124?

brandonnodnarb commented 4 years ago

The same thing appears to happen if loading SWEET via URL (http://sweetontology.net/sweetAll) instead of a local copy.

Which makes me wonder...how is COR resolving the errors/discrepancies?

graybeal commented 4 years ago

Is there any chance this relates to the nominally resolved issue with remote loading of SWEET (when ESIP switched to CloudFlare)? It feels rather familiar somehow.

brandonnodnarb commented 4 years ago

I don't see how that would have re-written ns prefixes. Unless I'm mistaken (which is certainly possible), COR just pulls/updates from this repo. It does not push to it.

lewismc commented 4 years ago

I’m sure there are numerous NS conflicts and also that NS are being used differently in different files. Now that we have a standard logic for defining NS we can fix that.

I’m pretty sure this code dealt with the initial mapping from OWL to TTL and that is where these issues where introduced.

I’m going start going through file by file and using some sed/ awk commands. I’ll post the command here and open a PR maybe one every ten or so files so we can review.

Essentially what we’re doing here is catching up on the technical debt which mounted up when SWEET was neglected in JPL before we open sourced it to ESIP.

lewismc commented 4 years ago

Hi @brandonnodnarb with Protege master branch I am able to load sweetAll both from URL and from local file.

lewismc commented 4 years ago

I'm working on this right now folks I'll post a patch by end of day.

lewismc commented 4 years ago

@brandonnodnarb one assistance I've been using is

find . -type f -name '*.ttl' -exec sed -i '' -e 's/ anim:/ soman:/g' {} +

There are many pitfalls however due to inconsistency with existing prefixes. It's really taking a file-by-file approach to doing it properly.

graybeal commented 4 years ago

So couldn't you use the patterns for existing full IRIs to find the existing associated prefix for each full IRI, then replace that prefix everywhere in that file with the correct prefix for the particular existing full IRI?

brandonnodnarb commented 4 years ago

I thought so as well @graybeal. I wrote a script which finds all "prefix" lines in a ttl file, uses a regex pattern to get the URI in that line, uses another regex pattern to get associated namespace (in that line), then looks up the new namespace in the SHACL file, and does a string replace for all old ns with new ns.

Unfortunately, I got tied up with other work and wasn't able to complete it before @lewismc went on the sed ram-page. I am attempting to use that script, along with file diffs, as a way to verify PR #165.