PolMine / dbpedia

R Wrapper for Corpus Annotation with DBpedia Spotlight
3 stars 0 forks source link

Keep entity types returned by DBpedia Spotlight #24

Closed ChristophLeonhardt closed 6 months ago

ChristophLeonhardt commented 7 months ago

DBpedia Spotlight returns not only URIs for entities but also entity types. In get_dbpedia_uris() these values are currently omitted from the output. If kept, these entity types could be used to classify entities without additional SPARQL queries, for example if the textual data does not contain pre-annotated named entities.

ablaette commented 7 months ago

This will be absolutely useful, to keep the number of arguments minimal: I do not think, we need the additional argument. We should return types by default, and leave it to users to delete the column, if it is not wanted.

However, what we should do: document the columns of the return value, so that we know what we get.

ablaette commented 6 months ago

This is a minimal example to convey what is implemented.

library(dbpedia)

uris <- dbpedia_uris <- get_dbpedia_uris(
  quanteda::data_char_ukimmig2010[["Labour"]],
  language = "en",
  api = "http://api.dbpedia-spotlight.org/en/annotate"
)

We now have a list 'types' in the output table, which is a list with the parsed result. Potentially, this is not yet the ideal data representation, but I would leave the further discussion to: https://github.com/PolMine/dbpedia/issues/27