EnvironmentOntology / gaz

An open source gazetteer constructed on ontological principles
Other
7 stars 5 forks source link

Transitioning from GAZ to Wikidata - is there a comprehensive query? #37

Closed ddooley closed 3 years ago

ddooley commented 3 years ago

As a transition effort from GAZ to wikidata, I will try to figure out a comprehensive wikidata query that can retrieve all continents and countries of the world, and their states/provinces/territories, cities/towns/villages but before I start, has any GAZ curator already done this? I don't need the map from GAZ to Wikidata ID's - just the Wikidata hierarchy of things. [EDIT: this would be used to create a dynamic menu system to aid in data entry, providing picklists for users. Its not good UI just to dump users over in wikidata to search for a town etc.]

lschriml commented 3 years ago

Hello @ddooley, @matentzn has another ticket regarding this issue. Ideally, I would like to have a small GAZ be functional for all of our purposes. The information we need is already in the GAZ, we just all need a much smaller GAZ.

Using SPARQL, couldn't we just pull out the country info from GAZ, create a GAZ_country file ?

Cheers, Lynn

ddooley commented 3 years ago

I have a specific GAZ file for country of the world, and one specific to the NCBI https://www.ncbi.nlm.nih.gov/genbank/collab/country/ list. But my search for a query is one that is about preparing to drop GAZ for more commonly used geopolitical vocabulary; I had been thinking geonames, but wikidata entities sound fine too. Thoughts?

lschriml commented 3 years ago

Hello Damion, Can you share your country GAZ file ? If you don’t mind, I can post it to the GAZ GitHub. Do you think aligning the NCBI list and GAZ is needed ?

Geonames is a good choice. http://download.geonames.org/export/

Cheers, Lynn

Sent from my iPhone

On Mar 9, 2021, at 7:05 PM, Damion Dooley notifications@github.com wrote:

 I have a specific GAZ file for country of the world, and one specific to the NCBI https://www.ncbi.nlm.nih.gov/genbank/collab/country/ list. But my search for a query is one that is about preparing to drop GAZ for more commonly used geopolitical vocabulary; I had been thinking geonames, but wikidata entities sound fine too. Thoughts?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

cmungall commented 3 years ago

Here are some examples of geo queries using a framework I developed https://github.com/cmungall/sparqlprog_wikidata

It would be v easy to do the kinds of queries you mention

On Mon, Mar 8, 2021, 20:37 Damion Dooley notifications@github.com wrote:

As a transition effort from GAZ to wikidata, I will try to figure out a comprehensive wikidata query that can retrieve all continents countries and territories of the world, and their cities/towns/villages but before I start, has any GAZ curator already done this? I don't need the map from GAZ to Wikidata ID's - just the Wikidata hierarchy of things.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EnvironmentOntology/gaz/issues/37, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKJQWP6NMJB2UEDGMTTCWQXXANCNFSM4Y2XRYYA .

ddooley commented 3 years ago

Ok, great, I'll check that out Chris.

Lynn: There's this GAZ to INSDC country/georegion/ocean mapping spreadsheet we did a while back for GenEpiO: https://github.com/GenEpiO/genepio/blob/master/src/ontology/imports/INSDC%20to%20Gaz.xlsx

And this GAZ ontofox config file that includes all the INSDC geography names as well as US states, Canadian provinces, and a few other tidbits. Its easy to chop down to just national entities: https://github.com/GenEpiO/genepio/blob/master/src/ontology/imports/gazetteer_ontofox.txt The owl output file is in same folder.

Lastly there's this additional include file which adds a hasDbXref INSDC:country:cambodia etc. annotation to the above GAZ file if desired. https://github.com/GenEpiO/genepio/blob/master/src/ontology/imports/gaz_insdc_mapping.owl

Ciao, d.

ddooley commented 3 years ago

And p.s. the INSDC geography names as listed by NCBI were motivated to fulfill NCBI Biosample /country field.

lschriml commented 3 years ago

Thank you Damion !!

On Tue, Mar 9, 2021 at 11:43 PM Damion Dooley notifications@github.com wrote:

And p.s. the INSDC geography names as listed by NCBI were motivated to fulfill NCBI Biosample /country field.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EnvironmentOntology/gaz/issues/37#issuecomment-794868986, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBB4DJKRQ2BJQOOGFIFGSDTC3TGTANCNFSM4Y2XRYYA .

-- Lynn M. Schriml, Ph.D. Associate Professor

Institute for Genome Sciences University of Maryland School of Medicine Department of Epidemiology and Public Health 670 W. Baltimore St., HSFIII, Room 3061 Baltimore, MD 21201 P: 410-706-6776 | F: 410-706-6756 lschriml@som.umaryland.edu