EBISPOT / hancestro

https://ebispot.github.io/hancestro/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Ancestro has some ontology ids with a typo in URI #1

Closed Public-Health-Bioinformatics closed 5 years ago

Public-Health-Bioinformatics commented 7 years ago

I really appreciate your comprehensive nationality/ethnicity vocabulary! Now I was just doing some queries and ran into this... Most URIs are like rdf:about="http://www.ebi.ac.uk/ancestro/ but some have .owl suffix in them:

<!-- http://www.ebi.ac.uk/ancestro.owl/ancestro_0329 -->

<owl:ObjectProperty rdf:about="http://www.ebi.ac.uk/ancestro.owl/ancestro_0329">
    <owl:inverseOf rdf:resource="http://www.ebi.ac.uk/ancestro.owl/ancestro_0330"/>
    <rdfs:label xml:lang="en">hasDemonym</rdfs:label>
</owl:ObjectProperty>

Can these be normalized?

Also, is there a chance this could become an OBOFoundry ontology, with URI's like http://purl.obolibrary.org/obo/ ...

Regards,

Damion

daniwelter commented 7 years ago

Hi Damion, thanks for flagging this. We have plans to submit ancestro to the OBO foundry as the current URIs don't actually resolve. I will review our timeline on this and then decide whether to fix the current URIs or just release a new version with the persistent resolvable URIs.

Public-Health-Bioinformatics commented 7 years ago

Good to hear about the OBO foundry plans. One other question arises for me. I like the "plain english" country name list. But there are a few established geography ontologies out there - GAZETTEER, for example. Any desire to replace the ancestro country list ids with Gazetteer or your favourite other geo ids? I've been trying to normalize geo data as much as possible. (I realize a side issue is the treatment of countries as classes - which ancestro does, and which I like - vs instances, which in the ancestro case may add the complication of having to change logical relations.).

daniwelter commented 7 years ago

We looked at Gazetteer when we started to develop ancestro and decide very quickly that GAZ was simply too bulky and didn't meet our needs. One of our primary aims was being able to link wider geographical regions with specific countries and countries with ancestral groups to be able to satisfy a range of queries based on country of recruitment or origin and ancestry of GWAS samples. Additionally there is, as you mention, the issue of countries often being represented as instances of a country class, which again didn't meet our needs. I would be happy to look into reusing a good geo ontology though as long as it meets our criteria - any suggestions?

Public-Health-Bioinformatics commented 7 years ago

II wish I had something pure to offer visa vis geo ontologies. I'm stuck with Gazetteer since I'm trying to stay within OBOFoundry. What I've gone ahead and done is create a version of ancestro that references 99% Gazetteer for its country names, attached; I literally swapped Gazetteer ids in for ancestro country ids. Lets just say the country names are "punned" in the owl way, which seems an accepted practice. That and the attached gazetteer file together cover just about everything though there are some quirks I'm sure which I'll hope to spot in the next few weeks to spot. I have the OntoFox specification file for regenerating the gazetteer part if you are interested. I also have a script file that could simply annotate your ancestro file with country dbxref's to gazetteer ids, if interested.

I'm thinking of creating a version of Gazetteer that drops instances in favour of class of country. Gazetteer has such a large scope that it seems to work well with our global GenEpiO genomic epidemiology ontology aims. (We're creating user interface elements for selecting things like country / state/ province / territory, with the capability of having dynamic lookup via EBI's OLS for further granularity. Hence I'm willing to put a bit more work into this. Eventually we'll have geo polygon data too.)

I really appreciate the cultural / ancestral reference ontology. It is being brought into our FOODON food ontology too, as a way to relate foods to their culture.

Regards,

Damion

Public-Health-Bioinformatics commented 7 years ago

In talking with an EFO connected ontologist it came up that definitions are pretty key to getting into OBO. Do you have near-future plans of entering definitions for the relations (like hasMajorityEthnicity) and upper level terms? I suppose the lower-level term meaning is straightforward but I think OBO wants to ensure that any term can be looked up on its own such that its definition is apparent outside of its ancestro ontology context.

daniwelter commented 7 years ago

You wouldn't happen to be talking to Melanie, would you? She's 2 desks down from me :)

Latest plan is to replace country URIs with GAZ URIs and re-use GAZ definitions but keep the ancestro hierarchy. I hope to request OBO purls asap and will add definitions as I go along. Might take a little while for the ancestral groups as these definitions need review by the GWAS Catalog curators who are extremely busy at the moment.

Public-Health-Bioinformatics commented 7 years ago

Heh, not Melanie but nice to hear you work with her. It was Chris Stoeckert over at University of Penn, who is pushing EFO ahead with more ethnicity entries as time goes by.

So did you receive that ancestro.owl file attachment I sent you ... I actually did replace all the country URIS with what looked like their best Gazetteer equivalent, so there's a possibility of saving yourself some work. I'll send it again if needed. I can send you the script that does the replacement, and simple lookup table too if you want. I should say there were some problem cases where Gazetteer had both country and portions of Oceana island groups under the same name.

daniwelter commented 7 years ago

No, I'm afraid I didn't get the file - I suspect github stripped it. Could you please send it directly to dwelter@ebi.ac.uk? Cheers

Public-Health-Bioinformatics commented 7 years ago

ok, will do.

Public-Health-Bioinformatics commented 7 years ago

I think you should have it now. Is it helpful? I could re-run the script on the latest ancestro to get the complete CHEBI-updated ancestro if you want.