Closed redadmiral closed 5 years ago
Hi @redadmiral , could you paste here your agdistis.properties file?
Best
Hi @DiegoMoussallem, thanks for the quick answer!
#path to decompressed lucene 4.4 index
index=index/de
index_bycontext=index/de/context
#used to prune edges
nodeType=http://dbpedia.org/resource/
edgeType=http://dbpedia.org/ontology/
baseURI =http://dbpedia.org
#SPARQL endpoint to retrieve domain and range information
endpoint=http://dbpedia.org/sparql
#this is the trigram distance between words, default = 3
ngramDistance=3
#exploration depth of semantic disambiguation graph
maxDepth=2
#threshold for cutting of similar strings
threshholdTrigram=0.87
#heuristicExpansionOn explains whether simple coocurence resolution is done or not, e.g., Barack => Barack Obama if both are in the same text
heuristicExpansionOn=true
#list of entity domains and corporationAffixes
whiteList=/config/whiteList.txt
corporationAffixes=/config/corporationAffixes.txt
#Active popularity
popularity=false
#Choose an graph-based algorithm "hits" or "pagerank"
algorithm=hits
#Enable search by context
context=false
#Enable search by acronym
acronym=false
#Enable to find common entities
commonEntities=false
# IMPORTANT for creating an own index
folderWithTTLFiles=data/en
surfaceFormTSV=data/en/surface/en_surface_forms.tsv
Hi @redadmiral ,
Please change the following aspects of this file and see if it helps
from
nodeType=http://dbpedia.org/resource/
edgeType=http://dbpedia.org/ontology/
baseURI =http://dbpedia.org
#SPARQL endpoint to retrieve domain and range information
endpoint=http://dbpedia.org/sparql
to:
nodeType=http://de.dbpedia.org/resource/
edgeType=http://dbpedia.org/ontology/
baseURI =http://de.dbpedia.org
#SPARQL endpoint to retrieve domain and range information
endpoint=http://de.dbpedia.org/sparql
Thanks a lot, this did the trick! The unit tests are still failing but the queries are correctly disambiguated. Thanks a lot for the help!
Nice :)
The tests are only for English actually, we are considering to create tests for all languages.
I've experienced some problems using the german index data.
First eight unit tests fail if I run maven with the clean package arguments:
If I start the webservice without unit testing and query
url --data-urlencode "text='Die <entity>Freie Universität Berlin</entity> in <entity>Newcastle</entity>.'" -d type=agdistis localhost:8080/AGDISTIS
AGDISTIS returns only Identifier withnotInWiki
-prefix:[{"disambiguatedURL":"http:\/\/aksw.org\/notInWiki\/FreieUniversitätBerlin","offset":24,"namedEntity":"Freie Universität Berlin","start":5}, "disambiguatedURL":"http:\/\/aksw.org\/notInWiki\/Newcastle","offset":9,"namedEntity":"Newcastle","start":33}]
.With the english index data everything works like a charm and entities are resolved to the correct dbpedia entries. The output of the failing unit tests can be found here.