StefH / XPath2.Net

Lightweight XPath2 for .NET
Microsoft Public License
36 stars 14 forks source link

XPathNodeIterator.Count returns one less that the actual count w/ XPath2SelectNodes #59

Open Todo18 opened 2 years ago

Todo18 commented 2 years ago

We've swapped MS's built-in XPath 1.0 processor with the XPath2.Net library and are conducting a couple of (unit) tests, and have noticed something weird about the Count property (of the XPathNodeIterator). Logically, and according to MSDN (and the reference implementation by MS), Count returns the index/position of the last node in the set. Positions start counting at 1 according to the XPath specs, so this interpretation of Count makes sense when you have for example 3 nodes with indices 1 to 3.

However, when swapping libs, Count suddenly returns 2 for a set with 3 nodes, 0 for a set with 1 node, and -1 for the empty set. Could it be that indices start at 0 in the implementation, or are we missing something obvious?

Regardless, thanks for this amazing library!

StefH commented 2 years ago

@Todo18 Can you provide a unit-test for this?

Todo18 commented 2 years ago

Sure. Here's a sample test:

using System.Diagnostics;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Xml.XPath;
using Wmhelp.XPath2;

namespace Test.Functional
{
    [TestClass]
    public class Compliancy
    {
        [TestMethod]
        public void XPath2SelectNodes_Count_Returns_Number_Of_Items_In_Expression()
        {
            var xIn = Resources.Provider.LoadXmlDocument("count_xml.xml");
            var navigator = xIn.CreateNavigator();

            var brandSet = navigator.XPath2SelectNodes("/report/brand");
            var brandCount = (int)navigator.XPath2Evaluate("count(/report/brand)");

            var highVolumeBrandSet = navigator.XPath2SelectNodes("/report/brand[units > 20000]");
            var highVolumeBrandCount = (int)navigator.XPath2Evaluate("count(/report/brand[units > 20000])");

            Debug.Assert(brandCount == 5);  // These are just here to assert the correct (expected) value
            Assert.AreEqual(brandCount, brandSet.Count);

            Debug.Assert(highVolumeBrandCount == 2);  // These are just here to assert the correct (expected) value
            Assert.AreEqual(highVolumeBrandCount, highVolumeBrandSet.Count);
        }
    }
}

We tested this with the following input file (count_xml.xml):

<?xml version="1.0" encoding="utf-8"?>
<!-- chocolate.xml -->
<report month="8" year="2006">
    <title>Chocolate bar sales</title>
    <brand>
        <name>Lindt</name>
        <units>27408</units>
    </brand>
    <brand>
        <name>Callebaut</name>
        <units>8203</units>
    </brand>
    <brand>
        <name>Valrhona</name>
        <units>22101</units>
    </brand>
    <brand>
        <name>Perugina</name>
        <units>14336</units>
    </brand>
    <brand>
        <name>Ghirardelli</name>
        <units>19268</units>
    </brand>
</report>

When we run the test, if returns 1 less (4 resp. 1) than the expected number of items in the node set of each expression (5 resp. 2), and one less than the count-function (which returns the correct results). I hope I'm being clear. If not, let me know. :)

StefH commented 1 year ago

@Todo18 I copied your code and created 2 unit tests:

And these run fine?

Can you double check what's the difference?

StefH commented 1 year ago

@Todo18 Can you please take a look at the unit tests and check the difference with your code?

Todo18 commented 1 year ago

Apologies, I'm short on time these days. I will inspect our integration and let you know my findings.

StefH commented 1 year ago

@Todo18 Did you have time to check?

Todo18 commented 1 year ago

I'm checking it today. :)

Todo18 commented 1 year ago

I've taken the exact same code that you provided above, together with the v1.1.3 Nuget from the built-in Nuget manager in VS, and it fails the test. I have no idea what's going on. I'll look into it a bit more later today.

Todo18 commented 1 year ago

So I've had some more time to check. Before diving further into (proprietary) code, could you please try and rephrase your unit tests to conform to ours, and check the (new) results, because there seems to be a discrepancy between brandSet.Count (our code) and brandSet.Should().HaveCount(5) (your code, which translates to: brandSet.Cast<object>().Count(), which iterates the collection, and thus bypasses the Count property of the XPathNoteIterator altogether).

StefH commented 1 year ago

Please note that my unit tests are the same as your unit test. Please look at

Todo18 commented 1 year ago

To my knowledge, they are not the same: Count (in our code) is a property of XPathNoteIterator (see here), while HaveCount calls the Count method of Enumerable (see here). The former returns the index of the last node of the set, while the latter iterates over the set, and never calls the property of XPathNoteIterator (which is the point of the unit test).

To illustrate my point, I've added a second assertion to our unit tests, which succeeds, while the original assertion still fails.

Assert.AreEqual(brandCount, brandSet.Cast<object>().Count()); // Succeeds
Assert.AreEqual(brandCount, brandSet.Count); // Fails
StefH commented 1 year ago

According to the documentation: https://learn.microsoft.com/en-us/dotnet/api/system.xml.xpath.xpathnodeiterator.count?view=net-7.0

Count => Gets the index of the last node in the selected set of nodes. So it's not the number of nodes, it's the index, which is 4 in this case because we have 5 items.

https://github.com/StefH/XPath2.Net/pull/60

Todo18 commented 1 year ago

True, but positions start counting at 1 according to the XPath specs, so this interpretation of Count makes sense when you have for example 3 nodes with indices 1 to 3. (See here, for example: https://learn.microsoft.com/en-us/dotnet/api/system.xml.xpath.xpathnodeiterator.currentposition?view=net-7.0, or: https://www.w3.org/TR/2010/REC-xpath20-20101214/#eval_context)

So the indexing needs some tinkering, I guess.

Fiddle to illustrate: https://dotnetfiddle.net/QuErKT

StefH commented 1 year ago

But CurrentPosition is not the same as Count.

Todo18 commented 1 year ago

I printed the first item of the set, not the last. It was simply to illustrate the numbering of indices behind the scenes. But you're right, I've adjusted the fiddle to illustrate my point better. ;)

https://dotnetfiddle.net/YdcE3V