StefH / XPath2.Net

Lightweight XPath2 for .NET
Microsoft Public License
36 stars 14 forks source link

Default namespace not handled in XPath query #31

Closed EricGriffith closed 3 years ago

EricGriffith commented 4 years ago

I have an XML document like this that has a default namespace with no prefix:

<wells xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/terms/" version="1.4.1.1" xmlns="http://www.witsml.org/schemas/1series">
  <well uid="w-255312">
    <name>25/1-9</name>
    <timeZone>+00:00</timeZone>
    <commonData>
      <dTimCreation>2019-11-29T16:44:05.3733225+00:00</dTimCreation>
      <dTimLastChange>2019-11-29T16:44:05.3733225+00:00</dTimLastChange>
    </commonData>
  </well>
</wells>

I would like to be able to do a query like this: /wells/well

However, I get no results when I use XPath2SelectElements with this, regardless of whether I give it an XmlNamespaceManager with the default namespace working.

I know this was not supported in XPath 1.0, but I think my reading of the XPath 2.0 specification says it should be supported in XPath 2.0:

A QName in a name test is resolved into an expanded QName using the statically known namespaces in the expression context. It is a static error [err:XPST0081] if the QName has a prefix that does not correspond to any statically known namespace. An unprefixed QName, when used as a name test on an axis whose principal node kind is element, has the namespace URI of the default element/type namespace in the expression context; otherwise, it has no namespace URI.

I can see that in XPath.y, when processing a name test, the empty string is passed instead of context.NamespaceManager.DefaultNamespace when parsing the QName:

NameTest
   : QName
   {
      XmlQualifiedName qualifiedName = QNameParser.Parse((String)$1, 
        context.NamespaceManager, "", context.NameTable);
      $$ = XmlQualifiedNameTest.New(qualifiedName.Name, qualifiedName.Namespace);
   }
   | Wildcard
   ;

Am I misunderstanding something or is this a bug?

StefH commented 4 years ago

@EricGriffith I did not have the time to look into your question yet, I'll try to do this in the next days.

martin-honnen commented 3 years ago

This is still open. I think it is an important advantage of XPath 2 compared to XPath 1 that you can declare a default element namespace for XPath selection, see https://www.w3.org/TR/xpath20/#static_context saying:

[Definition: Default element/type namespace. This is a namespace URI or "none". The namespace URI, if present, is used for any unprefixed QName appearing in a position where an element or type name is expected.]

I think the .NET API provided by Microsoft allows that as https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmlnamespacemanager.addnamespace?view=net-5.0 says about the prefix argument to AddNamespace:

String The prefix to associate with the namespace being added. Use String.Empty to add a default namespace.

So while with XPath 1.0 the selection of e.g. //p would always try to select elements with local name p in no namespace with XPath 2 it should try to select elements with local name p in the default element namespace if one is set.

Based on that with XPath2 the following test case

       [Fact]
        public void XPath2SelectNodesWithDefaultNamespace()
        {
            var namespaceManager = new XmlNamespaceManager(new NameTable());
            namespaceManager.AddNamespace(string.Empty, "http://www.w3.org/1999/xhtml");

            var nodeList = GetXHTMLSampleDoc().XPath2SelectNodes("//p", namespaceManager);

            Assert.Equal(2, nodeList.Count);
        }

should work, I think, but currently fails.

The sample document is set up as

       private XmlDocument GetXHTMLSampleDoc()
        {
            var xhtml = @"<html xmlns='http://www.w3.org/1999/xhtml' lang='en' xml:lang='en'>
<head>
  <title>Example</title>
</head>
<body>
  <h1>Example</h1>
  <p>This is paragraph 1.</p>
  <p>This is paragraph 2.</p>
</body>
</html>";
            var doc = new XmlDocument();
            doc.LoadXml(xhtml);
            return doc;
        }

It is also important to note that the default element (and type) namespace is different from the default function namespace, in that regard it looks as if XPath2 currently is also flawed as the test case

       [Fact]
        public void BindingEmptyPrefixShouldNotBreakFunctionLookup()
        {
            var todoList = GetTodoListDoc();

            var namespaceManager = new XmlNamespaceManager(new NameTable());
            namespaceManager.AddNamespace(string.Empty, "http://example.com/ns1");

            var result = todoList.XPath2Evaluate("count(//todo-item)", namespaceManager);

            Assert.Equal(3, result);
        }

fails with Wmhelp.XPath2.XPath2Exception : The function 'count'/1 was not found in namespace 'http://example.com/ns1'. It looks as if https://github.com/StefH/XPath2.Net/pull/39 is an attempt to fix that flaw, I haven't been quite able to understand so far why it did not resolve the problem or which regressions it caused.

martin-honnen commented 3 years ago

I have tried to make functions use the XPath 2.0 function namespace if no prefix is used, that did allow me to use a default namespace for element selection but caused some regressions in the test suite for position() and/or last() related test cases.

In the end I settled on using that odd but in XPath2 prevalent XmlReservedNs.NsXQueryFunc in DefaultFunctionNamespace = XmlReservedNs.NsXQueryFunc as the default function namespace and then I got no more regressions and I think now function calls and element selection behave as in other XPath 2 implementations where for elements and types you can set a default element namespace, in the case of the XPath2 API by passing in an XmlNamespaceManager where the empty string has been added with the desired default namespace.

Result is at https://github.com/martin-honnen/XPath2.Net/tree/DefaultNamespace or the commits in https://github.com/martin-honnen/XPath2.Net/commits/DefaultNamespace, the one named "Local .NET framework version adaption" is not related to the fix in XPath2 but was needed to get me running the whole stuff on a Windows 10 system with .NET 4.6.1 present instead of the originally linked 4.6.2.

StefH commented 3 years ago

@martin-honnen

Thank you for the research and code fix proposal.

Can you create a PR from a branch which only contains the fix?

And if you need .NET 4.6.1 I can also add support for that (or did you have to make a lot of changes?).