dice-group / AGDISTIS

AGDISTIS - Agnostic Named Entity Disambiguation
http://aksw.org/Projects/AGDISTIS.html
GNU Affero General Public License v3.0
141 stars 37 forks source link

Chinese index not working #63

Closed RicardoUsbeck closed 6 years ago

RicardoUsbeck commented 6 years ago

When I use Chinses index and run the command curl --data-urlencode "text='北京上海.'" -d type='agdistis', I got the result like the follow.

[{"disambiguatedURL":"http:\/\/aksw.org\/notInWiki\/上海","offset":2,"namedEntity":"上海","start":10},
  {"disambiguatedURL":"http:\/\/aksw.org \/notInWiki\/北京","offset":2,"namedEntity":"北京","start":5}].

I replace the index with English version and run the command curl --data-urlencode "text='The University of Leipzig in Barack Obama.'" -d type='agdistis', the result is correct.

RicardoUsbeck commented 6 years ago

User used maven to deploy AGDISTIS. In the online demo this test entry works.

RicardoUsbeck commented 6 years ago

The agdistis.properties file was wrong. here is a correct one:

index=indexdbpedia_zh_2014

index_bycontext=index_bycontext

#used to prune edges
nodeType=http://zh.dbpedia.org/resource/
edgeType=http://zh.dbpedia.org/ontology/
baseURI =http://zh.dbpedia.org
#SPARQL endpoint to retrieve domain and range information
endpoint=http://dbpedia.org/sparql
#this is the trigram distance between words, default = 3
ngramDistance=3
#exploration depth of semantic disambiguation graph
maxDepth=2
#threshold for cutting of similar strings
threshholdTrigram=0.87
#heuristicExpansionOn explains whether simple coocurence resolution is done or not, e.g., Barack => Barack Obama if both are in the same text
heuristicExpansionOn=true
#list of entity domains and corporationAffixes
whiteList=/config/whiteList.txt
corporationAffixes=/config/corporationAffixes.txt

#Active popularity
popularity=false

#Choose an graph-based algorithm "hits" or "pagerank"
algorithm=hits

#Enable search by context
context=false

#Enable search by acronym
acronym=false

#Enable to find common entities
commonEntities=false

# IMPORTANT for creating an own index
folderWithTTLFiles=data/en
surfaceFormTSV=data/en/surface/en_surface_forms.tsv