fusepoolP3 / p3-dictionary-matcher-transformer

Dictionary Matcher is P3 transformer for SKOS based entity extraction.
Apache License 2.0
2 stars 3 forks source link

Only RDF/XML taxonomies supported? #9

Open ktk opened 7 years ago

ktk commented 7 years ago

If I have a taxonomy in Turtle I can't get the transformer to work:

[qtp933699219-15] ERROR com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler - http://relative-uri.fake/(line 1 column 1): Content is not allowed in prolog.
[qtp933699219-15] WARN org.eclipse.jetty.servlet.ServletHandler - /
java.lang.RuntimeException: com.hp.hpl.jena.shared.JenaException: org.xml.sax.SAXParseException; systemId: http://relative-uri.fake/; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at eu.fusepool.p3.dictionarymatcher.Reader.readDictionary(Reader.java:92)
    at eu.fusepool.p3.transformer.dictionarymatcher.DictionaryMatcherTransformer.<init>(DictionaryMatcherTransformer.java:124)
    at eu.fusepool.p3.transformer.dictionarymatcher.Main$1.getTransformer(Main.java:46)
    at eu.fusepool.p3.transformer.server.handler.TransformerFactoryServlet.getTransformer(TransformerFactoryServlet.java:126)
    at eu.fusepool.p3.transformer.server.handler.TransformerFactoryServlet.handlePost(TransformerFactoryServlet.java:71)
    at eu.fusepool.p3.transformer.server.handler.TransformerFactoryServlet.service(TransformerFactoryServlet.java:60)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:517)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:98)
    at org.eclipse.jetty.server.Server.handle(Server.java:461)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:284)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.hp.hpl.jena.shared.JenaException: org.xml.sax.SAXParseException; systemId: http://relative-uri.fake/; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler.fatalError(RDFDefaultErrorHandler.java:58)
    at com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:48)
    at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:209)
    at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:239)
    at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
    at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:151)
    at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:168)
    at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:155)
    at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:226)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:274)
    at org.apache.clerezza.rdf.jena.parser.JenaParserProvider.parse(JenaParserProvider.java:68)
    at org.apache.clerezza.rdf.core.serializedform.Parser.parse(Parser.java:240)
    at org.apache.clerezza.rdf.core.serializedform.Parser.parse(Parser.java:193)
    at eu.fusepool.p3.dictionarymatcher.Reader.readDictionary(Reader.java:36)
    ... 18 more
Caused by: org.xml.sax.SAXParseException; systemId: http://relative-uri.fake/; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    ... 38 more
ktk commented 7 years ago

It does work when I transform it to RDF/XML.