Closed am11 closed 4 years ago
/cc @krwq @KrzysztofCwalina @piotrpMSFT
Sorry I never hearth of this project and have no idea about the goal but at my search for XPath2 in .NET I also saw this initiative: https://xpath2.codeplex.com/ and https://qm.codeplex.com/ made by the same author. The last one is mentioned in http://dev.w3.org/2006/xquery-test-suite/PublicPagesStagingArea/
+1
Please make this happen
We need API proposal. Anyone wants to do that?
Just wanted to add that this is one of the top 5 issues on UserVoice for .net at the moment. I know many devs who really need this. Lack of support for v2 has bit me many times in SharePoint and Umbraco. Similarly, I know of developers who have had to use unfamiliar stacks for application integrations (e.g. EDI/BizTalk type projects), simply due to lack of any support here. While I am not really capable of helping, I and many, many others am very keen to see this happen.
https://visualstudio.uservoice.com/forums/121579-visual-studio-ide/category/31481--net https://visualstudio.uservoice.com/forums/121579-visual-studio-ide/suggestions/4450357-implement-xslt-3-0-for-net https://visualstudio.uservoice.com/forums/121579-visual-studio-ide/suggestions/3795831-native-support-for-xpath-2-0-or-xslt-2-0-in-net
@alirobe thanks for the context, that is useful! To clarify: the ask on UserVoice is for XSLT 3.0. If we can implement it on top of .NET Standard 2.0, then also Desktop might benefit from it (out-of-band package). If it is not possible or if it is costly, then maybe it is incentive for people to move to .NET Core ;-)
cc @sepidehMS @krwq
@danmosemsft maybe something to focus on after 2.0/2.1? You were collecting a list ...
cc @terrajobst
@karelz no problem! I don't really care where it gets implemented first, I just want a Microsoft API implementation somewhere... I'm sure that there will be enough demand to get that implementation surfaced everywhere. XPath & XSLT all need updating. XPath alone would be a great start, but they do kind of go together. :)
Should it be raised as an issue here? Happy to do that if needed. I'm not familiar with the way MS works on this stuff, but as you've said, an a API proposal (followed by test suite) would be an obvious starting point, and actually fairly project-agnostic.
Also, thanks so much for taking this seriously! This is something users have been crying for Microsoft to do for over 10 years. Anyone who does this will be a hero to tens of thousands of enterprise developers. I'm smiling just thinking of all the work-around code I will be able to delete! :)
@karelz it's already on my list. If I understand @krwq it seems it could be done while keeping our XML library netstandard 2.0 compilant. It is just resourcing and it's great to keep gathering evidence to bubble up the list when 2.0 is out the door.
@danmosemsft that's great news that we already have it on road map!
@alirobe no need to file issues in .NET Standard -- .NET Standard is basically the common interface/intersection between Desktop (.NET Framework), .NET Core and Xamarin. If this can be implemented on top of .NET Standard, then all platforms will benefit. If it can't, it will be part of .NET Core future version, waiting for other platforms (Xamarin, Desktop) to catch up or be implemented as out-of-band package for those platforms, before we can add it to .NET Standard.
Also, thanks so much for taking this seriously!
BTW: We always take customer feedback and votes seriously. It sometimes might not look like that due to communication hiccups, or due to technical limitations (e.g. some changes in Desktop are breaking - a big no-no), but rest assured we do take it seriously (I believe it's true for all Microsoft, but I can at least guarantee it is true on .NET team). Of course, we can't in all cases commit to dates when things will be delivered (we have to align work with other priorities and other products, like Desktop - and we prefer to not communicate date when we are not 100% sure to avoid broken promises. In some cases we can't even commit if particular APIs will be delivered ever, especially when they are outside of our team/division ownership -- we have to work with partner teams inside Microsoft to come up with plans, and sometimes that takes time, and requires alignment of business priorities (the reality of large corporations). Nevertheless, in all cases, customers and success of .NET platform are in the center of our mind.
@danmosemsft @karelz - If we plan to reuse existing APIs I believe we will need some new switch/enum for tiny behavioral changes around parsing to not break any existing apps. Breaking changes would need to be opt-in. Anything else I believe is currently producing errors and would just start working after work is done. (I'd rather make people be more explicit about which version of XPath they choose)
At minimum this will be few new properties (hopefully just one property and perhaps a new XPathExpression constructor). We would need to figure out major breaking changes between XPATH 1.0 <--> 3.0 and figure out advantages and disadvantages of each solution
@krwq this is interesting topic. We hit similar spec incompatibilities in Http space - dotnet/corefx#13036.
If you think about this XPath case, can you imagine adding the spec-version choice as argument to constructor? Or is the relevant functionality exposed (also) as static methods? If it is via static methods (which is the case of dotnet/corefx#13036), then we either have to add spec-version argument to all of them, or create another class, or something. If that's the case, I'd like to start some general (in principle) API design discussions for these kinds of spec-versioned APIs -- please let me know where you think it falls. Thanks!
@karelz quick look it seems to me that we only have two overloads which take XPath expression: XPathExpression.Compile. I can see couple of options:
I don't think it matters too much which option we choose - most likely people will always want the newest XPath and I'm not actually expecting breaking changes to hit too many people since even spec claims that the breaking changes were made because syntax was confusing.
New class approach is probably the most discoverable since intellisense will suggest those options but likely will be a messier implementation with not much benefit.
IMO static property because it is simplest and you do it once per app with not much downside.
@krwq
it seems to me that we only have two overloads which take XPath expression: XPathExpression.Compile
There are other methods that take XPath, for example XmlNode.SelectSingleNode
or System.Xml.XPath.Extensions.XPathSelectElement
.
add a static property which would globally change the behavior (I'm not expecting anyone use two different versions of XPath in one project)
What if I use two libraries, one wants to use one version of XPath, the other one another version? I can imagine that happening quite easily, so I don't think static property would be a good option. (Or will libraries have to set the property before every XPath operation? That could work if the property was thread-static, but would be annoying.)
a new class which would inherit from XPathExpression
How would that work for other XPath methods? Considering my two examples, System.Xml.XPath.Extensions.XPathSelectElement
could probably work by creating e.g. System.Xml.XPath2.Extensions.XPathSelectElement
, since it's an extension method. But I don't see how would something similar work for XmlNode.SelectSingleNode
.
Agreed with @svick that we need options per library, that's why we need to design the static APIs carefully ... I'll dig deeper into this case and will try to start the general API design pattern discussion. @svick do you have any recommendations? (you seem to be quite familiar with the API surface)
@svick - you're right, those APIs all call XPathExpression.Compile in the end but your point about two different dependencies using different version is basically killing static property option.
Possibly we could add overloads which take XPathExpression instead of string although I believe that would become quite annoying to use but maybe it wouldn't be too bad - what do you think?
you seem to be quite familiar with the API surface
Not really, I just googled for XPath on XmlDocument
and XDocument
and found the two methods. But I do have some ideas on how the API could look.
The current state is:
namespace System.Xml {
public abstract class XmlNode {
public XmlNode SelectSingleNode(string xpath);
}
}
namespace System.Xml.XPath {
public static class Extensions {
public static XElement XPathSelectElement(this XNode node, string expression);
}
public abstract class XPathExpression {
public static XPathExpression Compile(string xpath);
}
}
There are other methods (like XPathNavigator.Compile
) and overloads that use XPath; the three methods above should be sufficiently representative, considering it's an instance method, an extension method and a static method.
Each version of XPath gets its own namespace:
namespace System.Xml.XPath2 {
public static class Extensions {
public static XElement XPathSelectElement(this XNode node, string expression);
public static XmlNode XPathSelectNode(this XmlNode node, string expression);
}
public abstract class XPathExpression {
public static XPathExpression Compile(string xpath);
}
}
namespace System.Xml.XPath3 {
…
}
…
Advantages:
using
.Disadvantages:
XmlNode.SelectSingleNode
) have to be changed to extension methods and renamed.Each XPath method gets new overloads taking XPathVersion
:
namespace System.Xml {
public abstract class XmlNode {
public XmlNode SelectSingleNode(string xpath, XPathVersion version);
}
}
namespace System.Xml.XPath {
public static class Extensions {
public static XElement XPathSelectElement(this XNode node, string expression, XPathVersion version);
}
public abstract class XPathExpression {
public static XPathExpression Compile(string xpath, XPathVersion version);
}
public enum XPathVersion {
XPath10,
XPath20,
…
}
}
Advantages:
enum
member).Disadvantages:
From the usage standpoint, I think I prefer option 1, even though it has its issues. I don't like the option of passing XPathExpression
around (suggested by @krwq) much: it results in very verbose code and I don't see how is it better than option 2, since it still means adding new overloads to all XPath methods.
@svick - thanks for the input
Option 1. Adding a namespace per version - you always need to create new namespace - I do not like that as any changes to XPath standard will make us add new namespace and types.
I'm not a fan of XPathExpression overload because the syntax will get quite annoying. Advantage is that after you add that overload the advantage is that you only add it once per version and no need to further add any overloads. The disadvantage is that string overload would always use XPath1 which will get confusing.
Option 2. I think it is as good as we can get. - my vote goes for that. Easy to add to any existing places - new version is just a new enum. For future updates we can use existing overload
Note that this is likely not only that 2 things built on top of XPathExpression.Compile - I'm expecting we will need to add something to XSLT and other places we likely missed although considering that is just adding an overload which takes an enum it doesn't matter too much if we miss it - anyone can easily contribute and fix any gaps
I have been using XPath2.Net by StefH for a while now. It works very well, although it has some minor disadvantages; the main one (for me) being that it keeps the compiled XPath2 expression and the runtime environment in one object, which is not thread-safe.
It (obviously) uses a separate namespace, and I have never experienced that as a problem. I would think that those who know XPath2 (or 3) have no problem using that exclusively. It is almost completely compatible with XPath1. Therefore, I would favor option 1 (adding a namespace). Once you get used to it, you will never want to look back (which a version parameter forces you to do).
Option 3 could be what XPath2.Net does, add a XPath2Expression
class, and XNode.XPath2Select()
etcetera (see the XPath2.Net documentation).
What I would very much like to see is the possibility to define variables that can be used in the XPath expression. For example (XPath2.Net):
public object Evaluate(IContextProvider provider, IDictionary<XmlQualifiedName, object> vars)
Another feature that I like a lot is the ability to have user-defined functions. In XPath2, these are added to a function table, like
functionTable.Add(XmlReservedNs.NsXQueryFunc, "generate-id", 0, XPath2ResultType.String, (context, provider, args) => ...);
In my application, I repeat a set of XPath computations often (as in 100,000 times or more), and being able to compile the XPath expression is important for efficiency and performance.
@svick @nverwer I think we should get to some conclusions with these.
IMO here is what we should do:
that should give us combination which is easy to manage (no new namespace) and easily discoverable (and no need to pass additional arg each time).
Please let me know if you like/dislike this. Once we agree on this we should be able to officially propose new APIs and make a plan for doing the feature work.
PS. @nverwer AFAIK you can define variables for current implementation in .NET too: https://weblogs.asp.net/cazzu/30888 - not super intuitive but definitely possible
@krwq
Any place which takes XPath string as an input we should add more overloads i.e. XPath2Select; XPath2SelectSingleNode etc.
So, to add a new version of XPath, you would need to add a new overload to all these methods? I'm not sure that's better than having each set of overloads as extension methods in a separate namespace when it comes to managing it.
It would also pollute your completion lists with all these methods you're never going to use (since most people are likely going to stick with a single version of XPath).
@svick we would have to create namespace per each class using XPath - if we put extension methods in the xpath itself you would get circular dependency. One option would be to reuse XPathNavigator or IXPathNavigable (I believe those should be independent of XPath version - possibly except what I wrote below) and add extension methods to them instead of each class using XPath and do not touch any of the existing methods - the downside of that would be that in some cases you would need to call CreateNavigator in some cases.
Other thing we also need to think about is that XPathNavigator.Select(string) is virtual which I'm not sure how it would work once we add more versions. I think I'll need to experiment with these a little bit and see what can be done and what can't.
Can something like this pattern help? https://githubengineering.com/scientist/
@alirobe we already use similar pattern to compare different XPathNavigator implementations - this is generally a convenient approach when you need to test something really quickly when having two or more similar implementations (in XPathNavigator case it was XPathDocument vs XPath.XDocument vs XPath.XmlDocument - one of them was considered more mature and less likely to have bugs). In this case I believe the risk is much lower since XPath2 and 3 mostly extend existing standard and there is very little which actually changes
Cool. Good to know it's just a "naming things" problem.
I think we should also consider the impact of adding new code to the size of the applications. AOT toolchains (.NET Native, CoreRT, Xamarin) all use tree shakers to avoid including code which the app won't use. But in order for these to work, the dependency on the new code has to be discoverable at compile time. This typically means that if the only difference is a value of a parameter the tree shaker will not figure it out. For the most part the tree shakers can't figure out actual values for parameters. So having the new functionality in either a new namespace or type would be preferable from this point of view. This would only apply if we were to implement the new functionality as a separate code base internally. If it would simply extend the existing XPath internals to support the new features it might be next to impossible to avoid the size increase in the apps.
@vitek-karas would treeshaker be able to figure out those kind of patterns?
enum Foo { a, b }
static void Bar(Foo foo)
{
if (foo == Foo.a)
{
// something pulling deps
}
else
{
// something pulling more deps
}
}
static void Main()
{
Bar(Foo.a);
}
If not could you provide how would you write simple branching so that treeshaker will remove unused path?
Unfortunately our tree shakers can't figure out branching like that currently (not 100% for ILLinker, but .NET Native will not for sure). We can obviously tweak the tools, but it gets complicated really fast. Usually the code is not as simple as above, and if the value if passed through a field and so on... we run into trouble. What seems to work is things like:
In all these cases the tree shaker would not include the method/property/type if the app didn't use the feature. With that we could refactor the framework to then only pull the expensive pieces of code from those methods/properties/types, and let the rest go through a simple interface or something similar.
That said, if we plan to build XPath3 as just an extension of the existing XPath engine, this whole tree shaking idea is probably moot, since there would only be one large piece of code (the one XPath engine) and we would need it for all XPath queries regardless of which version they would use.
If anyone else would value and use this support, please thumb up the top post to help us prioritize vs. other ports.
I'm wondering whether an assembly level attribute could set the desired version in case none is specified in the apps in case it's implemented in the current namespaces. If nine is specified, use the 1.0 standard. If your assembly specifies one then for all code executing directly from your assembly will use the version that's specified. When calling into another assembly which she if it's a different version, it will use those for methods and types constructed from that context...
A similar construct could used with a using statement, akin to a Transaction context. That would at least alleviate the need for 100 extra enumerated parameters all over your code to opt into a higher version.
As to the discoverability of the extra namespaces / assemblies, a Roslyn rule+fix could solve that. It could also help resolve minor api incompatibilities when going to a higher version.
Looking at Saxon, they use setLanguageVersion("3.0")
to determine version. Assuming there are no copyright / IP issues, it would make sense to keep consistent with their approach, since many will have been using the Saxon engine in lieu of the .Net support, so this makes the switch simpler.
Please consider making async
methods - for custom XPath/XSLT functions doing IO etc.
Great to see that this is on the agenda.
When thinking about an API for XPath 2.0 or XPath 3.1, do bear in mind that the type system is much richer than XPath 1.0. For example, an XPath 2.0 expression can return a sequence of strings, or a sequence of integers; an XPath 3.1 expression can also return a map or an array (or even a function!). (So an API that's conceived entirely around the idea of navigating a tree of nodes may be conceptually misaligned.) This applies to input values as much as return values: it's important to make it easy to supply parameters for XPath expressions (you want to discourage people from building XPath expressions by string concatenation because of the code injection risk).
Newer version support of XSLT would also benefit BizTalk Server and Azure Logic App XML transform, both which build upon .NET's support of XSLT.
Is this being worked on?
@stephen-lim not currently being worked on. This seems like a gap that is relatively easy for 3rd party libraries to fill. We would be open to community contributions, if we could agree on an API proposal. @karelz
Here are some facts I collected from XML area experts in June:
Given all these fairly high costs and the fact 3rd party solutions exist (which seems to be more than reasonable workaround), I think it is more valuable for BCL team to invest into areas which do not have any existing alternatives yet. At least for now. That said, if anyone from community is motivated to contribute towards the effort, we could come up with iterative plan how to enable parts of the work in waves in an experimental package, or something like that.
I believe currently there are no 3rd party options available for .NET Core and won't be for the forseeable future. The most popular one being Saxonica.com for .NET will not work because it relies on IKVM.NET (Java) and IKVM already announced will not be supporting .NET Core. The developer lost faith in .NET and no longer wants to pursue IKVM. Read his weblog here. The other one is Altova.com but it runs on COM so that won't work.
XSLT support is one of the most requested feature (1857 votes) in UserVoice but it's disappointing to hear no one is willing to invest in it.
Please reconsider.
@stephen-lim thanks for your info and link to UserVoice (I was not aware of that). It is definitely valuable information for us and we take it into account during planning.
I want to point out that while Jeroen stopped working on IKVM.NET, it does not preclude its port to .NET Core under different name.
Overall, I think it would be good to find out if any of the 3rd party libraries have intentions to move either to .NET Standard or .NET Core (incl. Saxonica.com).
Regarding XSLT upvotes - let's not forget the upvotes started piling up before .NET Core existed, therefore were targeted at .NET Framework, where the workaround exists in the form of 3rd party libraries. Better indicator is IMO the number of upvotes on this issue - currently 47, which is pretty high up in CoreFX repo at (position 4). Although, I bet that some of those upvotes are not about .NET Core, but about general availability of the APIs also on .NET Framework, but we can ignore that for now.
We face a tough decision: The investment is pretty high and therefore it needs to be either deprioritized (as it is now), or it has to come at the cost of investments elsewhere, e.g. DirectoryServices, logging, fast consistent networking stack, CollectibleAssemblies, general performance improvements, just to mention a few.
Just to clarify: It is still on our backlog (as I hinted in previous reply), it is just not something we plan to prioritize right now. We are open to further feedback and information, and we are open to change our prioritization based on more data.
@stephen-lim are there other options out of the list posted above?
For example, I grabbed the first one, saxon9he-api, and ran apiport and it shows as 100% compatible. I didn't try using it.
In Visual Studio 2017 from August onwards, .NET Core apps will accept nuget packages even if they only claim to support desktop. In such a case it will warn and the onus is on the developer to run apiport and to test their scenarios, but in many cases we've found the libraries work fine and just aren't packaged explicitly for .NET Core 2.0 or .NET Standard 2.0 yet. If and when we find such libraries, we can reach out to help their owners repackage them.
@danmosemsft thanks for sharing the list. A few of them are not XSLT processors. If you read the comments posted in that link, you'll see others have evaluated and none of them will work for .NET Core.
We don't necessarily need XSLT 3.0. Is there a possibility to implement XSLT 2.0 as a start? That should reduce the amount of work needed and still bring a lot of improvements. XSLT 1.0, being the first release, is lacking a lot of features that is specifically addressed in 2.0
@stephen-lim I see only comments which tried to run it on .NET Core 1.x. .NET Core 2.0 has much larger area surface. I think it is fair to expect they may just work on .NET Core 2.0.
XSLT2 has the cost of 1y (not sure what the diff of XSLT3 is). And it depends on XPath2, so implementing "just that" doesn't make it suddenly cheaper/easier :(.
@karelz One of the problems with Saxonica (the only one that is open source) is it relies on IKVM, which is a huge binary because it's trying to bridge between the Java and .NET world. A lot of folks are hoping for a native implementation in pure .NET for a very long time. Some of the postings go as far back as 2013. This is the original User Voice request with 802 votes for XSLT 2.0. Later when XSLT 3.0 came out and MS still haven't implemented it, a new request was started to push for XSLT 3.0 (the one with 1857 votes) going into .NET Core by that time.
@stephen-lim does Saxonica rely on all of IKVM, or maybe just on its sub-components? (lightweight RefEmit maybe)
@karelz I'm not sure. Here is the published Saxon API that may shed some light and can provide a basis for future implementation.
@karelz Unforunately, the sort of developers who need XSLT 3.0 and XML stuff done are simply not the sort of people who are typically found on Github giving thumbs to a .NET core issue. Developers use whatever documentation they're given with regards to XML. I appreciate that almost nobody is an XML enthusiast. It's not a hobby tech. Despite that, this issue is the no. 4 most voted/commented issue here.
This work would be extremely significant. It would unleash all sorts of business application developers to achieve significantly more with their data and processes, and it would have flow-on effects all over Microsoft's own code-bases. For instance; doing more interesting things with WCF configs, web.configs, XAML, Office OOXML (docx xslx pptx et al), various web services, and much more. Microsoft should be pushing forward XML & data interoperability standards with .NET.
When .NET was introduced, XML was most of what drove the application architecture and the vision that came along with it. XML is not just 'some library in .NET'. The idea that we would use a third party library for the parsing the piece of connecting infrastructure that symbolizes the entire original intent of .NET doesn't pass the smell test, at least to me. There's a reason this is a top UserVoice issue. It's painful for everyone. This is an opportunity to address that pain, and justify the shift to core.
This whole discussion around API versioning, to me, just reflects fear. The entire point of .NET core is surely to move past the fear, and fix core issues. I’m not sure one could really get much more core to the vision of .NET than XML. Perhaps I’m wrong, but that’s how I remember .net starting out.
Please, get the edge/whatwg team involved, get the spec guys involved, sort the issues out. The failure to deal with this causes fragmentation, and worst of all, it causes Java projects. This is why hundreds and thousands of enterprise integration developers who are living on a burning Oracle/J2EE platform simply can't use Microsoft. These are surely a prime audience for core. I would love to see Microsoft be a leader in this space, rather than a drag on the industry. Let's get up to date with standards, and let's start pushing them forward.
When I glance through the Github and UserVoice, I feel the roadmap decisions are almost arbitrarily based on what's easy and what's cool, and not necessarily what's needed for business. In this case, the people have voted loudly for XSL 2/3 only to be shot down. Why even have a vote in the first place?
It reminds me of users asking Microsoft to build a Web standard compliant IE. It took years to arrive and is now too little, too late. Our team has seriously considered moving the development to Java. Only .NET will continue to bleed developers for as long we still have big gaps like this that cannot be ignored.
@stephen-lim, I have seen a start-up adopt Java + IaaS over Azure as a platform for this reason alone. They created an entire custom crafted stack that basically replicates BizTalk Services. BizTalk would have done everything they wanted, except that the MS stack (admittedly non-core) couldn't handle XML transform (schematron) requirements, which were built into third party systems backed by legislation. This has been the deciding factor in canning or scaling back so many projects that I know would could have moved the world forward (and netted in licensing profit for MS). Large data interchange contracts are the most heart-breaking point for the stack to let you down. It indicates a proprietary mindset which is surely a thing of the '90s at this point.
A project I was using (Wyamio/Wyam#340) prepares for a shift to netstandard. This is why the discarded saxon and no longer support xslt 2.0. Unfortunatly I need this and for now I can't update to newer Versions of the project untill I refactored the saxon part out of there code. Fortunatly there code is very modular.
But I will still miss the step into netstandard wich I find sad 😢
Motivation
System.Xml.XPath
currently conforms with XPath 1.0 [W3C-xpath-1] and XSLT 1.0 [W3C-xslt-1] standards, but not XPath 2.0 [W3C-xpath-2], XPath 3.0 [W3C-xpath-3], XPath 3.1 [W3C-xpath-3.1], XSLT 2.0 [W3C-xslt-2] and XSLT 3.0 [W3C-xslt-3].The missing standard implementations in BCL are required by many consumer scenarios, for which .NET applications rely on third party libraries. One of the chief scenario is Content Query Web Part (CQWP) in SharePoint, where the users' XSLT code can be drastically minimized if v2 is supported by
System.Xml.XPath
. As for most parts, there are backward compatibility fallbacks available, that is; the code written in XSLT 2 precisely, can be expressed verbosely in XSLT 1 and since so forth.Pitfalls
Unfortunately, (besides the existing third-party libraries' APIs) I do not have an off-hand -- concrete -- method list to propose, as it requires further brain-storming on whether to auto-select processor based on the input or to explicitly separate the namespaces (
System.Xml.XPath2
andSystem.Xml.XPath3
).The point to ponder being; since the sub-languages XPath 2 and XPath 3 intrinsically facilitates backward compatibility modes, see XPath 2: J.1.3 Backwards Compatibility Behavior and XPath 3: 3.10 Backwards Compatible Processing, should the API be any different than the existing one and let consumers select the standard mode?