Open ablaette opened 1 year ago
The docker containers are not available for all architectures. Particularly not for M1. Therefore, we need to build from the dockerfile as follows.
git clone https://github.com/dbpedia-spotlight/spotlight-docker.git
cd spotlight-docker
docker build -t dbpedia/dbpedia-spotlight:latest
Then run the image as follows.
docker run -tid --restart unless-stopped --name dbpedia-spotlight.de --mount source=spotlight-model,target=/opt/spotlight -p 2222:80 dbpedia/dbpedia-spotlight spotlight.sh de
This is a snippet to pass data to DBpedia Spotlight using classes from the NLP package. Two alerts:
library(polmineR)
use("polmineR")
merkel_speeches <- corpus("GERMAPARLMINI") %>%
subset(speaker == "Angela Dorothea Merkel") %>%
as.speeches(s_attribute_name = "speaker", s_attribute_date = "date")
doc <- as(merkel_speeches[[2]], "AnnotatedPlainTextDocument")
y <- httr::GET(
url = "http://localhost:2222/rest/annotate",
body = list(
"data-urlencode" = sprintf("text=%s", doc[["content"]]),
"data" = "confidence=0.35"
),
httr::accept_json()
)
I would have hoped that offset positions of input and output correspond, but that does not seem to be the case:
library(jsonlite)
merkel_speeches <- corpus("GERMAPARLMINI") %>%
subset(speaker == "Angela Dorothea Merkel") %>%
as.speeches(s_attribute_name = "speaker", s_attribute_date = "date")
doc <- as(merkel_speeches[[2]], "AnnotatedPlainTextDocument")
request <- httr::GET(
url = "http://localhost:2222/rest/annotate",
query = list(
text = substr(doc[["content"]], 1, 990),
confidence = 0.35
),
httr::add_headers('Accept' = 'application/json')
)
# Output
httr::content(request, as = "text") %>%
jsonlite::fromJSON() %>%
pluck("Resources") %>%
head() %>%
.[, c("@surfaceForm", "@offset")]
# Input
as.data.frame(doc[["annotation"]]) %>%
as_tibble() %>%
mutate(word = sapply(features, `[[`, "word")) %>%
mutate(pos = sapply(features, `[[`, "pos")) %>%
select(-features) %>%
head()
See this as an entry point: https://github.com/dbpedia-spotlight/spotlight-docker
Alternative: https://opentapioca.org/ (without docker)