dbpedia-spotlight / dbpedia-spotlight-model

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text. Improving Efficiency and Accuracy in Multilingual Entity Extraction approach
http://www.dbpedia-spotlight.org
Apache License 2.0
178 stars 43 forks source link

How to load the spotters #17

Open darioloetscher opened 7 years ago

darioloetscher commented 7 years ago

Hey,

In the source code there are a bunch of spotters available. I can not figure out how to get them loaded. Can someone give me a short tutorial on setting up different spotters? Would be awesome if you additionally link me the needed models.

I wanted to try out the different spotters on the demo server http://model.dbpedia-spotlight.org/en/spot .

Thats the code snipped i used:

        text = ticket["ticket_short_description_translation"] + \
               "/n/n" + ticket["ticket_description_translation"]
        payload = {"text": text, "spotterName": spotter}
        headers = {'Accept': 'application/json'}
        result = requests.get('http://model.dbpedia-spotlight.org/en/spot', params=payload, headers=headers)
        results.append(result.json())

With the following spotters:

    spotters = ["LingPipeSpotter", "AtLeastOneNounSelector", "CoOccurrenceBasedSelector",
                "NESpotter", "KeyphraseSpotter", "OpenNLPChunkerSpotter", "WikiMarkupSpotter",
                "SpotXmlParser", "AhoCorasickSpotter", "Default"]

But unlucky i get the absolutly same results ... doesnt look like the spotters changed. The api even accepted a "bliblablubspotter". Which indicates that only the default seems to run.

What am I getting wrong?

sandroacoelho commented 7 years ago

Hi @darioloetscher, These spotters are part of Lucene implementation that is not available in our endpoint.

Under [api/model].dbpedia-spotlight.org, we have an approach described at Improving Efficiency and Accuracy in Multilingual Entity Extraction that has just one spotter.

Best,

darioloetscher commented 7 years ago

Thanks for your fast answer.

Where can i find the Lucene implementation? The "main" Repository https://github.com/dbpedia-spotlight/dbpedia-spotlight looks something like that... but im not really sure, in its docs it says:

We will keep this repository just to historical references. Every issue opened should be closed and reopened in their respective repositories.

and it is not building which is obviously a problem :)

In your downloads sections, there is a Lucene Model nearly 40gb or something... the models for this code are only about 1.5gb? As a rule of dumb normally the results off way bigger models are better, is this the case here?

Where do i find a documentation on how to deploy the lucene version locally with the spotters enabled?

I hope to use spotlight for the next years as a dependency of my project. Is the lucene version actively developed or are u guys dropping it? In that case i would be better off coding spotters for this spotlight version. Nevertheless by now i just need some test results with different spotters, in short with lucene and if its a success i will code them for this project and pushing them up (in some weeks^^).

Thanks