EnvironmentOntology / gaz

An open source gazetteer constructed on ontological principles
Other
7 stars 5 forks source link

Determine who is using GAZ and how it is being used #22

Open cmungall opened 5 years ago

cmungall commented 5 years ago

Anyone using GAZ, please add comments to this ticket!

Also include whether you use the obo or owl file, if you download or use an API, etc. Note any specific things you would like to see at a general level (or link to a ticket)

jamesaoverton commented 5 years ago

IEDB uses GAZ to build a list of countries that we care about and organize them under a shallow hierarchy. The code is here: https://github.com/IEDB/GAZ

ddooley commented 5 years ago

We're using GAZ in a few ontology projects: As a standard vocabulary for 1st order, 2nd order etc government: countries, provinces, states, territories and regions and municipalities. This is used in our Genomic Epidemiology Ontology (GenEpiO) to provide pick-lists for reporting outbreak cases; to describe patient nationalities, country of birth, and travel patterns, and eventually food traceability. In FoodOn, we have reused a 'hasCountryOfOrigin' annotation pointing to country, but would look to GAZ to describe region of origin as well.

We developed a browser-based lookup function to navigate 'located in' hierarchies, but have discovered that the connection between 2nd/3rd order govt and municipalities is pretty spotty. Here's an example of a control that provides both 2nd and 3rd order branches separately. It comes with a lookup function so that a given application ontology doesn't have to list all of GAZ. Instead a user can select an app-provided GAZ entry, then press "lookup choices" to get further sub-class or 'located in' related items.

I should also mention that GAZ is being proposed (with geonames as alternative) for locations related to Biosample collection in an ISO "TC 34/SC 9 working group 25 repository for draft ontology-driven specifications referenced in the draft working document ISO/TC 34/SC 9 N 000 "Microbiology of the Food Chain — Whole Genome Sequencing, Typing and Genomic Characterization of Foodborne Bacteria", visible at http://genepio.org/geem/form.html#GENEPIO:0002083

image

image

Pressing "lookup choices" fetches subordinate choices from OLS.

image

image

The final selection might not be contained in the local app ontology:

image

Having GAZ be refreshed in tandem with wikidata or geonames edits would be wonderful.

A quick peek at municipality list:

image

lschriml commented 5 years ago

Action Item: Chris Mungall, -- will send email to the OBO Foundry.

Action Item: Lynn - will examine BioSample - usage of GAZ

cmungall commented 5 years ago

@lschriml any update on details on whether/how GAZ is used in BioSample (EBI or NCBI)

lschriml commented 5 years ago

Yes, GAZ has been used in BioSample since it was created. GAZ is a core term in the MIxS standard. County field, geographic location. Also used in QIIME, QITTA, MGRAST, etc -- across GSC-associated projects

On Wed, Jun 26, 2019 at 11:57 AM Chris Mungall notifications@github.com wrote:

@lschriml https://github.com/lschriml any update on details on whether/how GAZ is used in BioSample (EBI or NCBI)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EnvironmentOntology/gaz/issues/22?email_source=notifications&email_token=ABBB4DPQUNSF4WPMRWL4Z4TP4OGV3A5CNFSM4HJEZR3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYT737Y#issuecomment-505937407, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBB4DIIHYF3ITJ66XLPZ6TP4OGV3ANCNFSM4HJEZR3A .

-- Lynn M. Schriml, Ph.D. Associate Professor

Institute for Genome Sciences University of Maryland School of Medicine Department of Epidemiology and Public Health 670 W. Baltimore St., HSFIII, Room 3061 Baltimore, MD 21201 P: 410-706-6776 | F: 410-706-6756 lschriml@som.umaryland.edu

cmungall commented 3 years ago

Working with @turbomam @wdduncan we now have a normalized tidied version of the INSDC sample database, so we can do some analysis of usage of GAZ with mixs:geo_loc_name

77159 distinct geo_loc_name values

GAZ IDs have been used 8 times in total in the whole database:

$ grep GAZ target/distinct-geo_loc_name.tsv | egrep 'GAZ:\d'
100     GAZ:00116363
53      GAZ:00116380
42      Azerbaijan:Caspian Sea (GAZ:00008076)
7       GAZ:00322747
4       GAZ:00315575
4       GAZ:00313293
4       GAZ:00322749
4       GAZ:00322744

the vast majority of usages are simple strings, so it's not clear GAZ is not being meaningfully used or adds value beyond other gazetteers

Top uses:

513500  USA
214152  missing
163139  not applicable
123076  United Kingdom
119337  China
74845   United Kingdom: United Kingdom
53333   Germany
47282   not collected
45201   Canada
44491   Australia
38338   Denmark
34607   NA
32928   USA: GAZ
31881   Netherlands
31818   Spain
31669   France
29070   Japan
28866   Sweden
27420   Finland
25698   Italy
23134   USA: California
22048   Switzerland
18143   USA:New York
18106   Brazil
16387   USA:CA:San Diego
16376   China: Beijing
15780   India
15663   USA:CA
15063   Pacific Ocean
14737   South Africa
14626   China:Beijing
12208   USA:Boston
12122   Norway
12008   Israel
11742   Chile
11725   South Korea
11333   Denmark: Copenhagen
11124   USA: Michigan
11086   Kenya
10982   Mexico
10789   USA: Oregon
10560   Malawi
10506   Singapore
10337   China:Hangzhou
10336   USA: Massachusetts
9102    USA: New York
9051    USA:MD
8808    China:Shanghai
8684    USA: Minnesota
8663    Austria
8602    Bangladesh
8532    USA:TX
8413    Russia
8396    Ireland
8396    Atlantic Ocean
8188    USA:NY
8097    Uganda
7971    New Zealand
7832    USA:GA
7765    China: Shanghai
7657    USA: Texas
7410    Belgium
7334    USA:NC
6996    Czech Republic
6934    USA:MN
6678    Not applicable
6638    Hong Kong
6538    United Kingdom: Oxford
6506    Missing
6503    Thailand
6336    USA: North Carolina
6121    China:Nanjing
6075    USA: Florida
6018    USA:PA
5961    Netherlands: western part
5807    N/A
5799    USA:CO:Boulder
5740    USA:WA
5670    USA:California
5550    Poland
5458    not provided
5374    Tanzania
5295    Australia: NSW
5271    Baltic Sea
5261    Portugal
5179    Canada: British Columbia
5160    USA:Michigan
5142    USA:IA
5055    United Kingdom: London
5005    Peru
4975    Taiwan
4906    Canada: Quebec
4836    Australia: Brisbane
4797    Canada: Saskatoon
4528    USA:South Fork Eel River, CA
4500    USA: Boston
4481    China: Hangzhou
4437    Canada: Ontario
4415    USA: Ohio
4397    USA:WI