Closed ChristophLeonhardt closed 6 months ago
This will be absolutely useful, to keep the number of arguments minimal: I do not think, we need the additional argument. We should return types by default, and leave it to users to delete the column, if it is not wanted.
However, what we should do: document the columns of the return value, so that we know what we get.
This is a minimal example to convey what is implemented.
library(dbpedia)
uris <- dbpedia_uris <- get_dbpedia_uris(
quanteda::data_char_ukimmig2010[["Labour"]],
language = "en",
api = "http://api.dbpedia-spotlight.org/en/annotate"
)
We now have a list 'types' in the output table, which is a list with the parsed result. Potentially, this is not yet the ideal data representation, but I would leave the further discussion to: https://github.com/PolMine/dbpedia/issues/27
DBpedia Spotlight returns not only URIs for entities but also entity types. In
get_dbpedia_uris()
these values are currently omitted from the output. If kept, these entity types could be used to classify entities without additional SPARQL queries, for example if the textual data does not contain pre-annotated named entities.