WolfgangFahl / ProceedingsTitleParser

Shallow Semantic Parser to extract metadata from scientific proceedings titles
Apache License 2.0
3 stars 1 forks source link

derive country,region and city from location info in wikicfp,crossref and CEURWS #50

Open WolfgangFahl opened 3 years ago

WolfgangFahl commented 3 years ago

This is partly a Natural Language Processing (NLP)/ Named Entity Recognition (NER) task. See https://stackoverflow.com/questions/tagged/geograpy for some library options

select count(locality) from event_wikicfp where locality is not null
73214
select count(distinct(locality)) from event_wikicfp
15673

10 most common examples from wikicfp, followed by 5 rare examples with different spelling and some more rare examples

count   locality
1372    Singapore
749 Beijing, China
704 Paris, France
649 Barcelona, Spain
625 Rome, Italy
616 Hong Kong
575 Bangkok, Thailand
502 Vienna, Austria
497 Athens, Greece
...
3   Montreal, QC, CANADA
3   Montreal, Quebec, Canada 
3   Montréal, Québec, Canada
3   Moscow, Russian Federation
3   Moscow,Russia
...
1      Aachen (Germany) 
1      Aachen / Germany
1      Aachen, Germany 
1      Aachen,Germany