MartinoMensio / spacy-dbpedia-spotlight

A spaCy wrapper for DBpedia Spotlight
MIT License
106 stars 11 forks source link

interesting features to add #4

Closed angelosalatino closed 2 years ago

angelosalatino commented 3 years ago

Hi Martino,

as parameter, it would be nice if we were able to set the confidence level.

MartinoMensio commented 3 years ago

Hi Angelo, Thank you for the suggestion! With version 0.2.1 you can control the confidence level. You can set it when you create the pipeline stage, or also change it afterwards.

import spacy
# load your spacy pipeline
nlp = spacy.blank('en')
# add the pipeline stage with the configuration options:
nlp.add_pipe('dbpedia_spotlight', config={'confidence': 0.4})
# use it
text = 'US bought a lot of vaccines'
doc = nlp(text)
# this will print: [(US, 'http://dbpedia.org/resource/United_States')]
print([(ent, ent.kb_id_) for ent in doc.ents])

# you can change the confidence if you have already instantiated the pipeline stage
nlp.get_pipe('dbpedia_spotlight').confidence=0.5
# now recompute the document
doc = nlp(text)
# this now won't have any results
print([(ent, ent.kb_id_) for ent in doc.ents])

You can also change the other parameters of the REST API can be changed: confidence, support, types, sparql and policy. As for the confidence, you can change them in the config dict or by accessing the attribute of the pipe stage. I will expand a bit the documentation as at the moment it is only detailed in #3.

With the current situation, you cannot change the configuration (e.g. confidence) on single docs, you have to do it at the nlp level.

Let me know if this works for you.

Best, Martino

angelosalatino commented 3 years ago

Very nice, Angelo