AskNowQA / AutoSPARQL

Warning: Not working at the moment. Maintainer on parental leave. AutoSPARQL allows to create SPARQL queries over RDF knowledge bases from natural language with low effort.
http://aksw.org/Projects/AutoSPARQL.html
GNU General Public License v3.0
91 stars 54 forks source link

Exception when using Umlauts #20

Closed KonradHoeffner closed 10 years ago

KonradHoeffner commented 10 years ago
Question: Give me all books written by Lorenz Bühmann.
Running template generation...
Tagged input: Give/VB me/PRP all/DT books/NNS written/VBN by/IN Lorenz/NNP Bühmann/NNP
Preprocessed: Give/VB me/PRP all/DT books/NNS written/PASSPART Lorenz/NNP Bühmann/NNP
Exception in thread "Thread-20" org.aksw.autosparql.tbsl.algorithm.ltag.reader.TokenMgrError: Lexical error at line 1, column 13.  Encountered: "\u00fc" (252), after : ""
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParserTokenManager.getNextToken(LTAGTreeParserTokenManager.java:321)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.jj_scan_token(LTAGTreeParser.java:533)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.jj_3R_2(LTAGTreeParser.java:396)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.jj_3_11(LTAGTreeParser.java:325)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.jj_3R_2(LTAGTreeParser.java:399)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.jj_3_4(LTAGTreeParser.java:337)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.jj_3_10(LTAGTreeParser.java:365)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.jj_2_10(LTAGTreeParser.java:249)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.Tree(LTAGTreeParser.java:89)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.TreeList(LTAGTreeParser.java:135)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.Tree(LTAGTreeParser.java:69)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.TreeList(LTAGTreeParser.java:135)
    at org.aksw.autosparql.tbsl.algorithm.ltag.reader.LTAGTreeParser.Tree(LTAGTreeParser.java:69)
    at org.aksw.autosparql.tbsl.algorithm.ltag.data.LTAG_Tree_Constructor.construct(LTAG_Tree_Constructor.java:16)
    at org.aksw.autosparql.tbsl.algorithm.ltag.parser.GrammarFilter.filter(GrammarFilter.java:272)
    at org.aksw.autosparql.tbsl.algorithm.ltag.parser.Parser.parse(Parser.java:68)
    at org.aksw.autosparql.tbsl.algorithm.templator.Templator.buildTemplates(Templator.java:179)
    at org.aksw.autosparql.tbsl.algorithm.learning.TBSL.answerQuestion(TBSL.java:158)
    at org.aksw.autosparql.tbsl.algorithm.learning.TBSL.answerQuestion(TBSL.java:150)
    at org.aksw.autosparql.tbsl.gui.vaadin.TBSLManager.answerQuestion(TBSLManager.java:339)
    at org.aksw.autosparql.tbsl.gui.vaadin.view.MainView$11.run(MainView.java:504)
    at java.lang.Thread.run(Thread.java:724)
KonradHoeffner commented 10 years ago

This is semi-solved so feel free to reopen if necessary.

The parsers have now been updated to handle umlauts and recompiled using JavaCC to Java.

TOKEN: {<WORD: (["a"-"z"]|["ä"]|["ö"]|["ü"]|["ß"]|["Ä"]|["Ö"]|["Ü"]|["À"]|["à"]|["Â"]|["â"]|["Æ"]|["æ"]|["Ç"]|["ç"]|["È"]|["è"]|["É"]|["é"]|["Ê"]|["ê"]|["Ë"]|["ë"]|["Î"]|["î"]|["Ï"]|["ï"]|["Ô"]|["ô"]|["Œ"]|["œ"]|["Ù"]|["ù"]|["Û"]|["û"]|["Ÿ"]|["ÿ"]|["0"-"9"]|["?"]|["-"]|["_"]|["!"]|[","]|[";"]|["."]|[":"]|["/"])+>} 

However this doesn't seem to work so Christina did some workarounds to made the parser get a normalized input without special characters while the full text still goes to the NER and Tagger before.