allysonlister / swo

The Software Ontology (SWO) is a resource for describing software tools, their types, tasks, versions, licences, provenance and associated data.
Creative Commons Attribution 4.0 International
39 stars 9 forks source link

Dupe ids and URI issue #10

Closed janelomax closed 5 years ago

janelomax commented 5 years ago

Hi Allyson - hope you are well, long time no see!

One of our pharma customers is using SWO and spotted a dupe id: SWO_0000075

In OLS they have slightly different URIs:

http://www.ebi.ac.uk/efo/swo/SWO_0000075 http://www.ebi.ac.uk/swo/SWO_0000075

(the http://www.ebi.ac.uk/efo/swo/ URIs don't resolve anywhere)

thanks

Jane

allysonlister commented 5 years ago

Thanks Jane - lovely to hear from you! I've added this to the current milestone for our imminent next release, so this is perfect timing. There are some big changes coming up, mostly in how we import EDAM and the general tidiness of the ontology.

Now, at a very quick count there are 712 classes that have the "http://www.ebi.ac.uk/efo/swo" prefix and 554 that have the standard "http://www.ebi.ac.uk/swo" prefix. I'm not sure why the efo prefix came in... @jamesmalone do you remember anything reason from SWO history for these?

I'm happy to reconcile and remove the http://www.ebi.ac.uk/efo/swo prefix and replace with http://www.ebi.ac.uk/swo but as you've pointed out Jane, this will require checking for duplicates and therefore needs to be done carefully.

If James also agrees that we should clean these IRIs, I'll try to do it this week one evening as I'd like to get the 1.7 release out very soon.

Jane - if you or your pharma customers would like a peak at the new SWO, your thoughts would be appreciated. It isn't published as a release, but the current working copy of the pre-release file can always be found at https://github.com/allysonlister/swo/blob/master/dev/ontology/swo-merged.owl

Thanks! Ally :)

drmarkreuter commented 5 years ago

Hi Ally, indeed, the URIs http://www.ebi.ac.uk/efo/swo/SWO_nnnnnnn don't resolve. It's frustrating because some of the software URIs are fine (http://www.ebi.ac.uk/swo/SWO_0000015 for Excel 2002), but most of dead. Happy to shoot over a list of URI I've tested and their status codes, if that's helpful. Many thanks, Mark.

allysonlister commented 5 years ago

I've created a list of efo URIs from SWO (see attached). Happy to have your list, and compare them if that helps? Please note I'm working from the upcoming release file, which is currently available at https://github.com/allysonlister/swo/blob/master/dev/ontology/swo-merged.owl

Here's the list of 661 URIs that begin with http://www.ebi.ac.uk/efo/swo: efo-swo-1.txt

For each of these I'd need to:

  1. check for a pre-existing "http://www.ebi.ac.uk/swo" IRI that matches it. If present, I'd need to generate a new SWO_ class number. Otherwise, just update the IRI and retain the class number.
  2. Transfer any annotation/axioms to the "new" class with the proper IRI.
  3. Deprecate the old "efo-swo" IRI, using appropriate replaced-by and owl deprecation flags.
drmarkreuter commented 5 years ago

Thanks! Here's the results of my testing this morning. SWOtesting1_20191001.zip

drmarkreuter commented 5 years ago

SWOtest2_20191001.zip I quickly tested the list of 661 URIs. All 404 :-(

allysonlister commented 5 years ago

That's OK - it makes sense that they're 404 - I expect there was a batch creation of SWO classes some time ago, and the "efo" URI was used instead of the standard one. I'd just like confirmation from James before I make any such large changes to the IRIs. Thanks!

drmarkreuter commented 5 years ago

oo, since you're working on SWO, could you add some new terms? URIs for BCBio and ShinyNGS would be good.

allysonlister commented 5 years ago

@drmarkreuter thanks for the suggestion - you'll find I've asked you to provide some basic info on each at #11 and #12 :-)

drmarkreuter commented 5 years ago

will do thanks. From the subset of SWO I'm working with, these are potential id clashes:

http://www.ebi.ac.uk/swo/SWO_0000005 MATLAB http://www.ebi.ac.uk/efo/swo/SWO_0000005 ABarray
http://www.ebi.ac.uk/swo/SWO_0000007 OmniOutliner http://www.ebi.ac.uk/efo/swo/SWO_0000007 Algorithms for Calculating Microarray Enrichment
http://www.ebi.ac.uk/swo/SWO_0000060 MUSCLE http://www.ebi.ac.uk/efo/swo/SWO_0000060 BZScan
http://www.ebi.ac.uk/swo/SWO_0000075 SPSS http://www.ebi.ac.uk/efo/swo/SWO_0000075 'Biostrings'
http://www.ebi.ac.uk/swo/SWO_0000076 SPSS 20.0 http://www.ebi.ac.uk/efo/swo/SWO_0000076 BlueFuse
http://www.ebi.ac.uk/swo/SWO_0000077 Sequence Alignment and Modeling System http://www.ebi.ac.uk/efo/swo/SWO_0000077 'BufferedMatrix'
http://www.ebi.ac.uk/swo/SWO_0000078 SAM 3.5 http://www.ebi.ac.uk/efo/swo/SWO_0000078 'BufferedMatrixMethods'
http://www.ebi.ac.uk/swo/SWO_0000079 Cytoscape http://www.ebi.ac.uk/efo/swo/SWO_0000079 'CALIB'
http://www.ebi.ac.uk/swo/SWO_0000152 iBioSim http://www.ebi.ac.uk/efo/swo/SWO_0000152 GACK
http://www.ebi.ac.uk/swo/SWO_0000155 COBRApy http://www.ebi.ac.uk/efo/swo/SWO_0000155 GEMTools 2.4
http://www.ebi.ac.uk/swo/SWO_0000158 libSBML http://www.ebi.ac.uk/efo/swo/SWO_0000158 GEOmetadb
allysonlister commented 5 years ago

I've spoken with @TheOntologist @jamesmalone and Helen Parkinson. James' thought is that, because SWO was created under the auspices of EFO, some of the IRIs have that structure. However, as they don't resolve, I'm going to go ahead and make the changes. This will be a multi-step process, and the code used to generate the changes together with mappings will be stored in its own directory in the dev/ folder.

Broadly, here's what will happen:

  1. Refactoring of all "efo" IRIs as per the mapping file (formatted for ROBOT rename) at https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/refactor-efo-swo-mappings.csv . This step will transfer all of the axioms / annotations etc to their new IRIs. There are two types of mappings in this csv file. The first 115 mappings will have new SWO IRIs as otherwise the ID used would clash with an existing ID in the "swo" namespace (either active or obsolete). The rest will not clash, and therefore will be able to retain their previous ID.

  2. Adding back all the original IRIs as obsolete terms with appropriate deprecated annotation. The list of 638 obsolete IRIs is in https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/efo-swo-1.txt and the sparql that will be used to insert the annotation is at https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/deprecation-annotation.ru

The file for step 1 is ready, and I'm just working on step 2 now. I'll let you know when I've made the change.

allysonlister commented 5 years ago

After Step 1 above, there remained 14 clashes in the SWO namespace:

In all cases, the updated IRI as suggested above was manually added to https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/refactor-efo-swo-mappings.csv and the mappings were re-run. Equally, we sorted out #18 #16 and #17 in this mapping file.

allysonlister commented 5 years ago

This is closed now, but I would really appreciate @janelomax and @drmarkreuter checking the resulting SWO pre-release file. As you requested the IRI change, please can you download https://github.com/allysonlister/swo/blob/master/dev/ontology/swo-merged.owl and see if you are happy with these changes? Many IRIs were modified.

If you notice any issues, please reopen this ticket.

Please note that, as this is still a pre-release, the new IRIs will not be resolvable in OLS etc, as they won't be loaded/indexed by those sites until the actual release happens.

Thanks very much!

janelomax commented 5 years ago

I loaded it into Protege and checked a few of the ids we have been discussing and all looks good to me!

drmarkreuter commented 5 years ago

checking now. I'm no ontology expert, but I'm spend some time on this this morning. Thanks for adding those terms!

allysonlister commented 5 years ago

Thanks guys I appreciate it. It's important to have a few people look at such a big change, I don't want to miss anything. :-)

drmarkreuter commented 5 years ago

Potential issue with SWO0000199 (GenePix Pro 4.0). robot couldn't convert from owl to obo... image However, I can convert to ttl, checking this now...

drmarkreuter commented 5 years ago

I don't see any issues...?? image

allysonlister commented 5 years ago

Thanks @drmarkreuter - I'm not familiar with OBO conversions, but the error message seems to indicate that there is more than one "name" in a class. If "name" in OBO corresponds to rdfs:labels, then I do think that there may be a few cases in SWO where more than one label has crept in over time. However, http://www.ebi.ac.uk/swo/SWO_0000199 is not one of these; it has a single label. Could it be that the obo converter is seeing the old obsolete IRI http://www.ebi.ac.uk/efo/swo/SWO_0000199 as "the same class"?

In any case, the output you're looking at after a ttl conversion looks correct. If you can figure out exactly what the robot converter is upset about in that class, I can look into changing things, but the OWL itself is sound as far as I can tell.

Thanks - I think we're good with the refactoring!

drmarkreuter commented 5 years ago

I agree, it all looks fine to me. I've manually checked some of the previous clashes, and happy to see that they have shiny new URIs.

depricated URI (efo) label New URI
http://www.ebi.ac.uk/efo/swo/SWO_0000005 ABarray http://www.ebi.ac.uk/swo/SWO_1100002
http://www.ebi.ac.uk/efo/swo/SWO_0000007 Algorithms for Calculating Microarray Enrichment http://www.ebi.ac.uk/swo/SWO_1100004
http://www.ebi.ac.uk/efo/swo/SWO_0000060 BZScan http://www.ebi.ac.uk/swo/SWO_1100038
http://www.ebi.ac.uk/efo/swo/SWO_0000075 'Biostrings' http://www.ebi.ac.uk/swo/SWO_1100048
http://www.ebi.ac.uk/efo/swo/SWO_0000076 BlueFuse http://www.ebi.ac.uk/swo/SWO_1100049
http://www.ebi.ac.uk/efo/swo/SWO_0000077 'BufferedMatrix' http://www.ebi.ac.uk/swo/SWO_1100050
http://www.ebi.ac.uk/efo/swo/SWO_0000078 'BufferedMatrixMethods' http://www.ebi.ac.uk/swo/SWO_1100051
http://www.ebi.ac.uk/efo/swo/SWO_0000079 'CALIB' http://www.ebi.ac.uk/swo/SWO_1100052
http://www.ebi.ac.uk/efo/swo/SWO_0000152 GACK http://www.ebi.ac.uk/swo/SWO_1100106
http://www.ebi.ac.uk/efo/swo/SWO_0000155 GEMTools 2.4 http://www.ebi.ac.uk/swo/SWO_1100109
http://www.ebi.ac.uk/efo/swo/SWO_0000158 GEOmetadb http://www.ebi.ac.uk/swo/SWO_1100111