Closed simonb83 closed 7 years ago
SpaCy's named entity recognizer%20Abu%20Bakr%20al-Baghdadi%20ordering%20Abu%20Muhammed%20al-Julani%20to%20organise%20jihadist%20groups%20in%20the%20region.%0A&ents=person%2Cnorp%2Corg%2Cgpe%2Cloc%2Cproduct%2Cdate%2Ctime&model=en) might be of use here. There's a nice web-based demo at the link.
Yes, I am using Spacy to extract the named entities and then attempting to identify the relevant countries. I've done some more work on this, to include:
It will be interesting to see if we have any issues with spellings or utf-8 characters.
The current fail cases (based on 290 articles from article_contents.csv) are: 'Amur Oblast', 'Balaghat', 'Betul', 'Bobonong', 'Bodoland', 'Burhanpur', 'Daily News', 'Dandane', 'Gatore', 'Gogrial East', 'Haikota', 'Harda', 'Hashenkit', 'Hulu Terengganu', 'Karubaga Village', 'Kirehe', 'Luangphabang', 'Mabumahibudu', 'Matshekge', 'Mayenrol', 'Mekong River', 'Mosweu', 'Naitasiri', 'Odisha', 'Rasetimela', 'Scribd', 'Tolikara', 'Viti Levu', 'Warrap State'
Would it be possible to use googlemaps api to find country? Most of these work:
https://maps.googleapis.com/maps/api/place/textsearch/json?query=Matshekge&key={ }
Returns:
"formatted_address" : "Bobonong, Botswana",
He @MrTones that's a nice idea. We could wrap a call to the API in a function, which we could then use if we cannot identify the country if the other methods fail.
Yeah i was thinking your current function should maybe keep something like a dictionary of places:country and then after scapy + dictionary if a noun isn't recognized make an Maps call last otherwise eventually with enough calls the project would have to start paying for the api.
If you need a free api place lookup you can use Mapzen https://mapzen.com/products/search/
Enhance the
country_code
function ininterpreter.py
in order to more reliably recognize countries. For example it currently fails for 'the United States' vs 'United States'.It would also be good to try and detect countries even though the name is not explicitly mentioned, i.e. from city names etc.
The Mordecai library may be an option, however it requires its own NLP parsing and I was wondering if there was a simpler way to do this without using two NLP libraries + trained models.