UUDigitalHumanitieslab / placenamedisambiguation

A pipeline of scripts that enables disambiguation of place names in a given corpus
MIT License
0 stars 0 forks source link

Set up a local multiNER environment #4

Closed alexhebing closed 5 years ago

alexhebing commented 5 years ago

Base this environment on the one from KB.

For the purpose of demo'ing NER (see below), it would be really neat if the language used (i.e. the language models used by the NER packages) could be switched fairly easily between Dutch, English and Italian (where available).

Goal For this first instance, the goal is to twofold:

1) Discover what it takes to setup a NER environment with multiple tools (and a script that combines their output). This will come in handy when setting up a 'real' environment.

2) Show the client the reality of NER'ing: closely review the output with them, and make it very clear that fetching coordinates from NE's is a separate task that will need to be implemented and/or tested separately.

alexhebing commented 5 years ago

OK, here are some notes on setting up the various tools in the multiNER package (on my Ubuntu 18.04 OS):

Stanford NER

DBPedia

` java --add-modules java.se.ee -jar dbpedia-spotlight-0.7.1.jar models/it http://localhost:2222/rest `

Polyglot

// English polyglot download embeddings2.en polyglot download ner2.en

MultiNER I have this running now. Getting Stanford to work took some debugging: there seem to be some inconsistencies in the way the results are parsed and then integrated into one result. For example, I had to change line 532 (new_result[ne]["type"] = ne_type[0] to new_result[ne]["type"] = { ne_type[0] : 1}, because the code later on (in the max_class method) couldn't handle strings (i.e. was expecting a dictionary)

alexhebing commented 5 years ago

Downloading DbPedia:

wget https://downloads.sourceforge.net/project/dbpedia-spotlight/spotlight/dbpedia-spotlight-1.0.0.jar

Dutch model:

wget https://sourceforge.net/projects/dbpedia-spotlight/files/2016-10/nl/model/nl.tar.gz/download