PolMine / dbpedia

R Wrapper for Corpus Annotation with DBpedia Spotlight
3 stars 0 forks source link

Stopwords used by DBpedia Stoplight my result in missing corpus positions of named entities #11

Closed ablaette closed 1 year ago

ablaette commented 1 year ago

The DBpedia Spotlight container applies a list of stopwords before running the actual disambiguation. This may result in a deviation between the result and the input, if a CWB corpus is our point of departure and if named entity starts or ends with a stop word.

Solution: reconstruct original region from corpus position.

ablaette commented 1 year ago

Fixed. Note: Latest version of polmineR also addresses this issue, because as.AnnotatedPlainTextDocument() takes in stop words and NE regions may be futile.