StefH / XPath2.Net

Lightweight XPath2 for .NET
Microsoft Public License
36 stars 14 forks source link

XPath ToString? #40

Closed jzabroski closed 1 year ago

jzabroski commented 3 years ago

Hi @StefH ,

I am searching for an XPath library that, given a node in a XDocument tree, I can get the singular XPath expression that precisely travels down to that tree.

Bonus would be a function that then culls out parts of the tree that are not that important in order to still get a single leaf from the parse forest.

I was looking at your examples and I don't see anything like that.

Is this possible to do? If yes, I will give it a shot and submit a PR to update the readme. Thank you!

StefH commented 1 year ago

@jzabroski Did you have time to investigate this?

Or is it not relevant anymore?

jzabroski commented 1 year ago

It is probably still a good idea but my focus shifted away from the use case behind this request, which is:

As an automation UI Tester, I want to use an XPath expression to capture a "web component" even if that component does not have an ID attribute or similar "automation ID", so that I can create a logical model of steps that happen on a page (click email field, type text, click signup).

The reason I wanted to build this is Chrome has a very bad plug in model that makes it possible for attackers to steal data, unless you are the author of all plug ins you use. :/

Obviously, the XQTS conformance is a huge factor in creating an accurate self describing file format for said logical model. With Blazor Client-Side, it should be possible to host Chrome plug-ins through WebASM, and thus utilize the strength of your library to build such a plug-in.

jzabroski commented 1 year ago

I believe one thing that I learned, from a brief code spike a year ago, is that while I can extract an XPath expression from an HTML document, browsers (and therefore web driver) may expect a particular XPath expression. For example, if the expression does not contain tbody, it won't work in webdriver.

StefH commented 1 year ago

Maybe a total different approach:

Did you investigate https://html-agility-pack.net/ to extract data from a html page? This is maybe easier then using xpath(2)?

jzabroski commented 1 year ago

I have used HtmlAgilityPack but it does not have a logical model. It is a procedural framework. I think your approach is better for an engineer looking to build something from the ground up. If you disagree, I am curious as to why (so I can learn).

Further, PuppeteerSharp ultimately speak to WebDriver, and WebDriver speaks xpath, and HtmlAgilityPack does not, so the general approach you are recommending would only work if I could have a static page that does not evolve on a per request basis, otherwise a large portion of test time will be spent ferrying data to and from the browser and the SUT. Agree or disagree?

StefH commented 1 year ago

Another idea: Did you checkout https://playwright.dev/ ?

jzabroski commented 1 year ago

Playwright is basically part of PuppeteerSharp. It doesn't solve this problem. It just retries whatever you tell it and uses async await and other basic engineering principles to avoid flaky tests.

StefH commented 1 year ago

😄

OK. Back to your user-story:

As an automation UI Tester, I want to use an XPath expression to capture a "web component" even if that component does not have an ID attribute or similar "automation ID", so that I can create a logical model of steps that happen on a page (click email field, type text, click signup).

So can you explain to me in simple steps what you are trying to do and what should be changed / added to this project?

jzabroski commented 1 year ago

Given an XML document and a node in that document, list the set of possible XPath expressions that would match for that node.

/html/body/center[2]/table/tr[2]/td[1]

would also be the WebDriver XPath:

/html/body/center[2]/table/tbody/tr[2]/td[1]

and, given either expression, you should get a list of alternate, relative expressions:

//center[2]/table[@id='t1']//tr[2]/td[1]
//center[2]/*//tr[2]/td[1]
//center[2]/descendant::*/td[1]
//center[2]/descendant::*/td[position()=1]

In other words, since you can take an XDocument and an XPath and produce a node, I was thinking the next logical step is to be able to take a node and an XDocument and produce an XPath.

StefH commented 1 year ago

This is possible, however there a almost unlimited xpath expressions which can represent an XmlNode / XElement, so probably only the simplest can be generated.

What you can checkout is these questions + answers which can generate a xpath string from a XmlNode or XElement: