Closed RicardoUsbeck closed 6 years ago
@RicardoUsbeck Thank you for logging this bug.
I've experimented a little more. I set the following properties in agdistis.properties to point to Chinese DPBedia:
nodeType=http://zh.dbpedia.org/resource/ edgeType=http://zh.dbpedia.org/ontology/ baseURI =http://zh.dbpedia.org
If I run the example from the Wiki:
curl --data-urlencode "text='The
the webservice returns the expected results:
[{"disambiguatedURL":"http:\/\/zh.dbpedia.org\/resource\/Shanghai","offset":8,"namedEntity":"shanghai","start":5},{"disambiguatedURL":"http:\/\/zh.dbpedia.org\/resource\/北京市","offset":3,"namedEntity":"北京市","start":17}]
However, running for examples with punctuation only
@RicardoUsbeck @lguillou This error comes from our stemming step ( see https://github.com/dice-group/AGDISTIS/blob/master/src/main/java/org/aksw/agdistis/util/Stemming.java#L91). Actually our preprocessing is able to deal with punctuations, but when AGDISTIS is not able to find any candidates in first the main search, it looks for more using surface forms search. However, if nothing is found again, it tries to stem the label then besides to stem the label this step also removes all punctuations.
I guess we will write a unit test for it and then try to find the bug. Thanks for the additional test.
When I run a Chinese webservice using Java and query it I get:
curl --data-urlencode "text='<entity>???</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS
I get:[{"disambiguatedURL":"http:\/\/aksw.org\/notInWiki\/???","offset":3,"namedEntity":"???","start":1}]
And in the terminal window where the webservice is running I see an error: