BaseXdb / basex

BaseX Main Repository.
http://basex.org
BSD 3-Clause "New" or "Revised" License
661 stars 267 forks source link

Minor behavior change #2228

Closed brandes-pq closed 12 months ago

brandes-pq commented 12 months ago

Description of the Problem

This is probably not all that important, but it surprised me and forced me to fix some scripts nontheless:

From commit 1e14f796a5ffdb798249c3ef3c62ff57b510a424 onwards, the following command line fails with an error:

> basex 'csv:parse("a,b")'
Illegal character in opaque part at index 10: csv:parse("a,b")

I think this contradicts the documentation, but it is not entirely clear what "The input string may point to an existing file" entails exactly.

Expected Behavior

Before 1e14f796a5ffdb798249c3ef3c62ff57b510a424 the same command worked as expected:

> basex 'csv:parse("a,b")'
<csv><record><entry>a</entry><entry>b</entry></record></csv>

Steps to Reproduce the Behavior

See the description above.

Do you have an idea how to solve the issue?

By using the option -q to disambiguate the argument to be an XQuery expression, the current version BaseX yields the expected result:

> basex -q 'csv:parse("a,b")'
<csv><record><entry>a</entry><entry>b</entry></record></csv>

What is your configuration?

Self-compiled BaseX, also verified with release 10.6

ChristianGruen commented 12 months ago

@GuntherRademacher I’m passing this on to you… And a guess: A vast variety of URLs exists in practice, but for our purpose, it may be sufficient to refine the URL validity check to a) accept jar:, but b) reject other URL schemes without trailing slash.

GuntherRademacher commented 12 months ago

@brandes-pq Thank you for reporting this!

@ChristianGruen I agree, the least invasive change would be to allow URLs without a slash following the scheme ("opaque" URLs) just for jar:. This would solve the problem at hand, and it would not appear again, unless there was a predefined namespace prefix jar, for jar:some-function().

Alternatively we could do more extensive URL validatation. This could be accomplished by using java.net.URL, which accepts only supported schemes, and java.net.URI, doing a syntax check:

  static boolean isValid(final String url) {
    try {
      return new URL(url).toURI().isAbsolute();
    }
    catch (@SuppressWarnings("unused") final URISyntaxException | MalformedURLException e) {
      return false;
    }
  }

This however would cause further changes of behavior, e.g. for tests in FetchModuleTest that use httttp://x - they will fail differently, or even succeed, because this used to be a valid URL, but then would not be any longer because of the unsupported scheme. Also it still leaves room for ambiguity.

I will thus prepare a PR with the first suggested change.

ChristianGruen commented 12 months ago

@GuntherRademacher Thanks. I’d go with the first suggestion. I agree it's not as general as it could be, but a clean validation would raise many new questions (it’s still an unsolved challenge even in the XPath/XQuery specs how a proper validation needs to look like…).

GuntherRademacher commented 12 months ago

Fixed by #2229.