BaseXdb / basex

BaseX Main Repository.
http://basex.org
BSD 3-Clause "New" or "Revised" License
688 stars 265 forks source link

Database Paths: compliance with resolve-uri(), base-uri(), … #1172

Open drmacro opened 9 years ago

drmacro commented 9 years ago

Given a database named "foo^bar" and this document at /docs/doc01.xml:

<doc xml:base="foo/bar/">
    <link href="doc2.xml"/>
</doc>

This query:

let $uri := 'foo^bar/docs/doc01.xml'
let $doc1 := collection('foo^bar/docs/doc01.xml')
let $doc2 := doc($uri)
let $doc3 := doc('foo^bar/docs/doc01.xml')
let $link := $doc3/*/link
let $baseURIdoc :=  base-uri($doc1)
let $baseURILink :=  base-uri($link)
let $resolvedURI := () (: resolve-uri($baseURI, string($link/@href)) :)

return<result>
<doc1>{
$doc1
}</doc1>
<doc2>{
$doc2
}</doc2>
<doc3>{
$doc3
}</doc3>
<link-elem base-uri-doc="{$baseURIdoc}"
base-uri-link="{$baseURILink}"
 resolved-uri="{$resolvedURI}"
>
{$link}
</link-elem>
</result>

Fails with the message:

URI argument is invalid: Illegal character in path at index 3: foo^bar/docs/doc01.xml.

The failure is on the baseURILink value--if you change it to "()" (commenting out the call to base-uri()), the query succeeds, and in particular, the value of base-uri() for the document itself succeeds.

Thus there is an issue with getting the base-uri() of an element within the document.

ChristianGruen commented 9 years ago

You are completely right, the BaseX extensions for handling databases don't go 100% hand in hand with standard XQuery functions, and I am not sure if it's that easy to bring them together. We usually suggest using the helper functions of our Database Module (db:name, db:path).

drmacro commented 9 years ago

While I can work around the issue I think it's a potentially serious problem for BaseX because it means that certain processing cannot be implemented using generic XQuery functions.

If I was maintaining a common DITA support package, for example, that would be a problem and I would be tempted to simply not support BaseX simply because I don't have the bandwidth to maintain two different versions of the same code.

For my DITA for Small Teams project I'm committed to using BaseX because it offers significant advantages otherwise (light weight, ease of installation, good support for DTD-based processing, etc.) and I can't change now.

But at some point the base code I'm implementing will need to be generalized for use in any XQuery database and that point the issue will come to fore--certainly for DITA URI resolution is a large part of what the code is doing (because DITA is all about linking).

I understand the challenge in modifying or extending the way BaseX constructs and uses URIs at this point, but I think it's something you must plan for because things are definitely broken as they stand today.

ChristianGruen commented 9 years ago

But at some point the base code I'm implementing will need to be generalized for use in any XQuery database and that point the issue will come to fore--certainly for DITA URI resolution is a large part of what the code is doing (because DITA is all about linking).

I guess that, at least today, there will be no chance to do it all without vendor-specific code. Historically, it's partially because XQuery is completely database-agnostic, and as a consequence, every XMLDB integrated completely different ways to retrieve, store and update XML on document, collection or database level.

But of course I agree that it would be nice to be able to completely rely on the standards at some time in future.

ChristianGruen commented 9 years ago

Copied from #1171:

Given this document in a repo named "uri-test" at the location "/docs/doc02.xml":

<doc xml:base="foo/bar/">
      <link href="doc2.xml"/>
    </doc>

and this query run from the admin panel:

let $uri := 'uri-test/docs/doc-01.xml'
let $doc1 := collection('uri-test/docs/doc-01.xml')
let $doc2 := doc($uri)
let $doc3 := doc('uri-test/docs/doc02.xml')
let $link := $doc3/*/link
let $baseURI := base-uri($link)
let $resolvedURI := resolve-uri($baseURI, string($link/@href))

return<result>
<doc1>{
$doc1
}</doc1>
<doc2>{
$doc2
}</doc2>
<doc3>{
$doc3
}</doc3>
<link-elem base-uri="{$baseURI}"
 resolved-uri="{$resolvedURI}"
>
{$link}
</link-elem>

</result>

I get this failure from resolve-uri():

Base URI is not absolute: "doc2.xml".

The problem is that document-uri() returns "uri-test/docs/doc02.xml", which is not an absolute URI.

But it needs to be one for the normal XPath functions that expect absolute URIs to work.

Either the built-in implementation of document-uri() needs to recognize this URI as being absolute (because it starts with the name of a repository) or, better, BaseX needs to provide a URL scheme that can be used in this case, e.g. "basex://uri-test/docs/doc02.xml".

Without this, there's no way to use normal URI-manipulation functions.

For example, I'm trying to use @xml:base to determine the effective URL for a relative reference made within the scope of the @xml:base, but that fails within BaseX because of this issue.

ChristianGruen commented 8 years ago

I spent some time in working through our database URI handling to facilitate the use of standard XQuery functions. This is the new status quo:

A new snapshot is online. @drmacro: If you have time for it, your feedback will be welcome. The handling of invalid URI characters (such as ^) hasn’t changed so far, as it might introduce some incompatibilities with existing databases.

ChristianGruen commented 7 years ago

The issue applies to various other characters that are no valid URI characters (see #1464).

drmacro commented 7 years ago

Somehow I didn't see the May 4 update for this issue. I'll see if I can evaluate this fix in the context of my current D4ST code.

ChristianGruen commented 4 years ago

Related feedback from the mailing list: https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg12563.html

ChristianGruen commented 2 years ago

Related: https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg13809.html

Could possibly be tackled with BaseX 10.

ChristianGruen commented 2 years ago

We came across too many implications that would need to be tackled, so we’ll postpone this to a later version.