eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
429 stars 179 forks source link

Feature request: Have Saxon resolve doc() and collection() from xmldb #351

Open wshager opened 10 years ago

wshager commented 10 years ago

Saxon or other XSLT engines could benefit from resolving docs and collections from eXist's storage directly.

adamretter commented 10 years ago

We would need to create a Resolver that can access the eXist database and configure Saxon to use it.

adamretter commented 10 years ago

I have assigned this to eXist-3.0 as (I guess without investigating) it could introduce backwards compatibility issues with those already using various URI in their fn:doc and fn:collection statements of XSLT executed with Saxon in eXist.

dizzzz commented 10 years ago

I am pretty sure such a resolver already eXists; I coded it myself :-) I'll try to find it back :-)

shabanovd commented 9 years ago

This should work. @wshager can you confirm that this actually do not work in last develop branch?

wshager commented 9 years ago

I'll check, but this issue was opened @adamretter 's request, so perhaps he had different intentions with the resolver, i.e. replacing the current one.

adamretter commented 9 years ago

@wshager Do you have tests that show this doesn't work?

wshager commented 9 years ago

AFAIK doc() does work, but collection() doesn't. Here's the stack trace for <xsl:for-each select="collection('xmldb:///db/test/collection')"><xsl:copy-of select="."/></xsl:for-each>

2015-04-08 22:03:53,211 [eXistThread-233] WARN (Transform.java [fatalError]:816) - XSL transform reports fatal error: Error reported by XML parser ; Line#: -1; Column#: -1 net.sf.saxon.trans.XPathException: Error reported by XML parser at net.sf.saxon.lib.StandardErrorHandler.reportError(StandardErrorHandler.java:95) at net.sf.saxon.lib.StandardErrorHandler.fatalError(StandardErrorHandler.java:80) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:405) at net.sf.saxon.event.Sender.send(Sender.java:178) at net.sf.saxon.Configuration.buildDocument(Configuration.java:3516) at net.sf.saxon.lib.StandardCollectionURIResolver.catalogContents(StandardCollectionURIResolver.java:236) at net.sf.saxon.lib.StandardCollectionURIResolver.resolve(StandardCollectionURIResolver.java:122) at net.sf.saxon.functions.Collection.iterate(Collection.java:106) at net.sf.saxon.expr.instruct.ForEach.processLeavingTail(ForEach.java:414) at net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212) at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1034) at net.sf.saxon.Controller.transformDocument(Controller.java:1959) at net.sf.saxon.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:148) at org.exist.util.serializer.ReceiverToSAX.endDocument(ReceiverToSAX.java:85) at org.exist.storage.serializers.XIncludeFilter.endDocument(XIncludeFilter.java:165) at org.exist.storage.serializers.Serializer.toSAX(Serializer.java:931) at org.exist.xquery.functions.transform.Transform.eval(Transform.java:266) at org.exist.xquery.BasicFunction.eval(BasicFunction.java:70) at org.exist.xquery.InternalFunctionCall.eval(InternalFunctionCall.java:56) at org.exist.xquery.AbstractExpression.eval(AbstractExpression.java:71) at org.exist.xquery.PathExpr.eval(PathExpr.java:264) at org.exist.xquery.AbstractExpression.eval(AbstractExpression.java:71) at org.exist.xquery.XQuery.execute(XQuery.java:297) at org.exist.xquery.XQuery.execute(XQuery.java:217) at org.exist.http.servlets.XQueryServlet.process(XQueryServlet.java:491) at org.exist.http.servlets.XQueryServlet.doPost(XQueryServlet.java:197) at javax.servlet.http.HttpServlet.service(HttpServlet.java:755) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:669) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:457) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:575) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:229) at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:103) at org.exist.http.urlrewrite.Forward.doRewrite(Forward.java:50) at org.exist.http.urlrewrite.XQueryURLRewrite.doRewrite(XQueryURLRewrite.java:556) at org.exist.http.urlrewrite.XQueryURLRewrite.service(XQueryURLRewrite.java:356) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:669) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1448) at de.betterform.agent.web.filter.XFormsFilter.doFilter(XFormsFilter.java:164) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:488) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:943) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: org.xml.sax.SAXParseException: Premature end of file. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) ... 79 more

dizzzz commented 9 years ago

collection() will never work right? we can stream out XML data, but we can't give saxon access to the exist-db internals. Different products, different domains. It is a fundamental thingy ...

wshager commented 9 years ago

@dizzzz I'd probably never need collection() ...

adamretter commented 9 years ago

@dizzzz I don't see why we could not do collection(). It just resolves a URI to n nodes

dizzzz commented 9 years ago

Just have it another thought again. It will require changes in saxon.

shabanovd commented 9 years ago

It's simple to do, here some links:

dizzzz commented 9 years ago

welll, there you have it already; :-) I did not see it before, but how to deal with a collection like /db ... that would load the whole content of the database into Saxon ! That sounds very wrong to me, good luck for those who will answer the problems over and over again on the mailinglist.

no, a 'per document' approach sounds good to me, we should not just support all wishes from our users.

if we would add the collection() thing, I'd like to have it configurable, and switched off by default.

IMO these are two separate worlds, we should keep healthy distance.

awagner-mainz commented 9 years ago

Just for the record, here is a use case: http://stackoverflow.com/questions/26276705/access-to-filesystem-from-exist-xslt-find-html-myid-html-with-collecti

Second, my lay knowledge/reading stumbled upon a passage in the CollectionURIResolver interface description that Dimitri pointed to, which (perhaps) said that the resolver could return a sequence of URIs to documents and need not necessarily "load the whole content of the database" - but I might have gotten this wrong or it may not mean any simplification after all...

dizzzz commented 9 years ago

ah, no, the resolver returns a sequence of saxon Items, one doc-node per document, each Item shall contain all data of the referred document.

shabanovd commented 9 years ago

@dizzzz I think you wrong two times here. 1st The items returned by this iterator must be instances either of xs:anyURI, or of node() (specifically, NodeInfo), so it can be sequence of urls. 2nd, if people do stupid things it's not possible to stop doing it, even now there are huge options for that. But if someone understand all effects and know/want to use feature to make code simple why interrupt that person?

eXist decide do not develop his own xsl transformer and deliver/use saxon, if so integration must be complete.

PS Huge amount of java feature that can be used wrongly make me crazy -)

dizzzz commented 9 years ago

@shabanovd good to know the uri-sequence; that makes it a bit better;

it still does not convince me; it makes the difference between exist-db and saxon more unclear; and why making the 'attack vector' to make existdb crash larger?

dizzzz commented 9 years ago

and... the 1st next question on exist-open will be: why is saxon collection() operations soooo slow? the fist bug report will appear (since all subsequent operations are again done on xml-byte streams)

adamretter commented 9 years ago

With a little bit of work they don't need to be done on byte streams. We could implement NodeInfo with lazy evaluation.

On 9 April 2015 at 12:51, Dannes Wessels notifications@github.com wrote:

and... the 1st next question on exist-open will be: why is saxon collection() operations soooo slow? the fist bug report will appear (since all subsequent operations are again done on xml-byte streams)

— Reply to this email directly or view it on GitHub https://github.com/eXist-db/exist/issues/351#issuecomment-91207030.

Adam Retter

skype: adam.retter tweet: adamretter http://www.adamretter.org.uk