EnvironmentOntology / gaz

An open source gazetteer constructed on ontological principles
Other
7 stars 5 forks source link

Upload all of GAZ to wikidata #3

Open cmungall opened 6 years ago

cmungall commented 6 years ago

We don't have resources to update gaz.obo. Unless we can find a volunteer it may make most sense to upload to wikidata and have people update on wikidata (having a way to export a gazetteer in obo or owl from wikidata will be easy).

If people are in favor I can look to getting some tips on how best to do this

pbuttigieg commented 6 years ago

Given the amount of country and regional data in the wikidata system, I think this is a viable option, especially if we can somehow link the existing GAZ IDs to the WD entries. It may hurt the semantic rigor of GAZ a bit, but perhaps this can be mitigated by linking the GAZ/WD instances to ENVO classes.

We should consider circulating a link to this issue via some general mailing lists, just to be sure we have input.

cmungall commented 6 years ago

Yes, we should definitely link the WD instances to their types. As a first step I got an ENVO ID property registered in WD. Next is to upload the core ENVO graph (facilitated by https://github.com/EnvironmentOntology/envo/issues/600).

I think WD would be open to including a lot of the GAZ relationship types in. There are already some cognates of the RO relations in there, e.g. tributary. For anything else that does not fit, we can maintain our own axiom layer.

mail lists: I contacted the obo-discuss list.

When we're ready we can engage wikidata. A lot of the technical parts should be quite straightforward with the infrastructure Andrew Su's group has put into place, but we will need to make a case for inclusion in WD, and that what we have is trustworthy.

lschriml commented 6 years ago

Have you contacted Michael Ashburner about GAZ, I developed it with him. If he is not able, or does not wish to, I will volunteer to take care of the GAZ. And can coordinate other volunteers to work on the GAZ.

And I can work with Andrew Su/Wikidata for integration options, as we are already working together.

Cheers, Lynn

cmungall commented 6 years ago

Unfortunately Michael isn't able to develop it any further.

It would be great to have you as the caretaker, and coordinate with Andrew and others on wikidata integration, thanks!

For short term management of the gaz edit file this old thread may be useful:

http://gmod.827538.n3.nabble.com/cv-relationships-td4039290i20.html#a4042423

mjy commented 6 years ago

We spent a lot of time integrating just a couple gazeteers with GIS layers (GADM, Natural Earth, TDWG hierarchy), in a relational DB format, the "normalization" is seriously non-trivial, and it never ends.

I too question whether a ontology is the right solution for these data. Wiki-data, assuming it can handle the shape data, may be a good solution, but even there it will likely fail unless the concepts of time and synonym at different levels are carefully worked out (e.g. same name [language specific], same time, different shape representation, same source of shape representation).

The most important issue to me is to represent your GIS data as shapes, so that you can compute. To my knowledge there is no means of reasoning across shapes in OWL, so again this suggests that GAZ is not particularly the best representation for maintaining these data.

lschriml commented 6 years ago

Sounds good Chris.

Have you and Suzi been in touch with Michael ? I am happy to work on this, great to keep this moving forward.

Cheers, Lynn

On Apr 1, 2018, at 4:01 PM, Chris Mungall notifications@github.com wrote:

Unfortunately Michael isn't able to develop it any further.

It would be great to have you as the caretaker, and coordinate with Andrew and others on wikidata integration, thanks!

For short term management of the gaz edit file this old thread may be useful:

http://gmod.827538.n3.nabble.com/cv-relationships-td4039290i20.html#a4042423 http://gmod.827538.n3.nabble.com/cv-relationships-td4039290i20.html#a4042423 — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EnvironmentOntology/gaz/issues/3#issuecomment-377812692, or mute the thread https://github.com/notifications/unsubscribe-auth/AEIeDfZHewIuBT29IxQNbZaXRDGz5rTGks5tkTITgaJpZM4S95_d.

Public-Health-Bioinformatics commented 6 years ago

Happy to hear GAZ could be synchronized with an updated resource like Wikidata.

I would put in a plug for linking countries like Yugoslavia as historic/archaic, perhaps using "instance of" "historical country" as wikidata does (so I can avoid including them in selection lists). Also would be great to take in GeoNames ids via wikidata as that is the other comprehensive open source resource I've seen.

@mjy GAZ could perform a useful role in terms of reasoning via "located in", and "shares border with" type relations without expecting it to have reasoning power on GIS lat/long.

Now in GAZ it seems like municipality lists aren't complete - and was hoping a wikidata integration could resolve this (e.g. I could find only GAZ "populated place, Brazil" http://purl.obolibrary.org/obo/GAZ_00002831) vs https://en.wikipedia.org/wiki/Municipalities_of_Brazil ? I see for example that this info exists as e.g. instances of https://www.wikidata.org/wiki/Q3184121 "municipality of Brazil" class. (We have some dynamic lookup trying to fetch municipality for user's biosample location.)

Lynn, much appreciated that you can organize the GAZ v2!

mjy commented 6 years ago

@Public-Health-Bioinformatics I agree that kind of reasoning might be useful, but your examples are getting exactly to my point.

All ontologies come with certain learning curve, what are the concepts, how are they organized, are they complete, when were they updated etc. While there is certainly a fairly steep learning curve behind the kinds of ways we can represent a GIS layer/shape, once we have a shape/point in place we can largely ignore all this type of baggage, i.e. shapes will "just work" with respect to queries like intersection, nearness, containing, boarder sharing etc., i.e. I don't have to worry that someone curated a "located_in" assertion, or did a syncronization, etc.

So playing the devils advocate (in fact I actually wanted to use something like GAZ several years back when we came up with our GIS models) why try to replicate all of this functionality with human made assertions that must be continually curated and understood when you can depend on a parellel system that is specifically designed to address these questions (and address them very quickly, and with much higher precision)?

Public-Health-Bioinformatics commented 6 years ago

I get that - I wouldn't look to OWL logic + GAZ to do what GIS queries do even if GAZ had lat/lon. For biosample descriptions though, it would be great to have comprehensive ontology identifiers for municipal and other levels of govt. such that they map over exactly to an updated GIS database of such things. For those users needing to enter lat/lon (e.g. for NCBI biosample data requirements), this could be looked up reliably - and immediately if in GAZ directly. So if GAZ can be updated with this information comprehensively via script from wikidata on a periodic basis, then I like that. Or perhaps sourced from GeoNames instead? In this vision, geo entity names and located_in relations are actually curated in wikidata or elsewhere. Only works if source database is satisfactory though.

cmungall commented 5 years ago

Just an update on this, I have processed 4k of the 6k+ GAZ entries

High confidence matches here:

https://github.com/cmungall/environments2wikidata/blob/master/matches/align-high-confidence-gaz.tsv

cmungall commented 4 years ago

All 6k entries are now processed!

Around ~167k of fairly high confidence mappings. Note these can act as seed to get more high confidence ones.

https://github.com/cmungall/environments2wikidata/blob/master/matches/align-high-confidence-gaz.tsv

The complete subset of Wikidata in ttl plus all hypothetical matches are stored here: https://osf.io/unga9/ (upload in progress)

Note we now also have a property in wikidata: https://www.wikidata.org/wiki/Property:P6778