DeltaXML / vscode-xslt-tokenizer

VSCode extension for highlighting XSLT and XPath (upto 3.0/3.1)
MIT License
46 stars 4 forks source link

Support XML Catalogs #28

Open bmix opened 4 years ago

bmix commented 4 years ago

Hello,

thanks for this great extension!

I have a line

<xsl:include href="xmlcatalog-lib.xsl"/>

in my stylesheet and an entry in my XML Catalog

<uri id="xmlcatalog-lib" name="xmlcatalog-lib.xsl" uri="file:///C:/Users/bmix/Projekte/xslt/xsl-xmlcatalog-lib/src/xmlcatalog-lib.xsl"/>

which gets resolved via oXygen (SaxonPE/EE) but not in deltaxml.xslt-xpath (SaxonHE, though it seems Catalog support is available also in HE).

pgfearo commented 4 years ago

For running XSLT, this extension supports the catalogFilenames property in tasks.json. However to help resolve symbols like included functions and variables from within the editor, there is currently no catalog support.

Providing a full implementation of the OASIS XML Catalogs specification is unfortunately beyond the scope of the editor at this point.

Currently there is support for resolving package names instead? Package names are resolved using settings like:

"XSLT.resources.xsltPackages": [
       { "name": "example.com.package1", "version": "2.0", "path": "included1.xsl"},
       { "name": "example.com.package2", "version": "2.0", "path": "features/included2.xsl"},
       { "name": "example.com.package3", "version": "2.0", "path": "features/not-exists.xsl"}
]

Would something similar be adequate for xsl:include and xsl:import? Perhaps something like:

"XSLT.resources.includesAndImports": [
       { "name": "xmlcatalog-lib.xsl", path": "C:/Users/bmix/Projekte/xslt/xsl-xmlcatalog-lib/src/xmlcatalog-lib.xsl"}
]
bmix commented 4 years ago

Would something similar be adequate for xsl:include and xsl:import? Perhaps something like:

"XSLT.resources.includesAndImports": [
       { "name": "xmlcatalog-lib.xsl", path": "C:/Users/bmix/Projekte/xslt/xsl-xmlcatalog-lib/src/xmlcatalog-lib.xsl"}
]

Not really, since this adds another place where one has to do housekeeping (since the other tools I use all support the catalog, I'd need to keep of all the stuff I produce on two sides).

But I understand the concern. Time is limited and one can do just as much as one can do! :-) So, yes, for the time being, this would be better, than just throwing an error. Absolutely!

pgfearo commented 4 years ago

Yes, you've found the effective entry point in `DocumentLinkProvider.provideDocumentLinks'.

Perhaps the most important issue for VSCode programming language extensions is to keep the editing experience for the user responsive at all times.

As you've found, my routines iterate over tokens, keeping tree information only as long as it's needed in the context. They are designed to get only 'just enough' data to perform their task. For example, XML well-formedness checking is not performed on imported/included files.

The extension currently performs no caching as this can be quite hard to do reliably given the number of extensions providers in use and given that the order of requests from VSCode is unpredictable (the onDocumentChange event does not always fire before the Semantic Token request for example).

To effectively work with XML Catalogs, there would probably need to be some caching to avoid parsing possibly multiple nested catalog files.

So far, I've avoided using 3rd party libraries (apart from Saxon XSLT itself!). This is because it can be quite hard to learn how to use them most effectively, but also they have often been written with a very different set of priorities than required for a VSCode extension.

One alternative approach would be, for a specific project, to convert XML Catalogs to JSON files - probably using XSLT.

I took a similar approach for the XML Schema for XSLT 3.0 - converting it to a TS class. This is what provides the auto-completion information and will (eventually) allow full validation of the XSLT elements and attributes as they are being types.

bmix commented 4 years ago

Perhaps the most important issue for VSCode programming language extensions is to keep the editing experience for the user responsive at all times.

Oh yes! :-)

To effectively work with XML Catalogs, there would probably need to be some caching to avoid parsing possibly multiple nested catalog files.

I am currently brainstorming a design proposal. As it seems, one would need two tables: the one, which contains all the XML Catalogs' complete mappings and one, that is only valid for the current document. The first one would need to be updated upon file-update of any of the catalogs. The second one would need to be notified of this update on one side and on the other side the editor would need to be notified of any changes, which the editor would need to poll, then and when. That sounds complex, but it may save time. Otherwise the editor would need to parse the whole catalogs each time it updates the document, which sounds like it would introduce a horrible latency.

So far, I've avoided using 3rd party libraries (apart from Saxon XSLT itself!). This is because it can be quite

Yes, that was the feeling I got when investigating your project. I completely understand.

You may know of the Lemminx Language Server Protocol server for XML, which also has a VSCode client. They do full logical heavy lifting (in Java). The problem is, that they do not plan to support XSLT (nor XQuery) for the foreseeable future, maybe never, so they pointed to your project.

hard to learn how to use them most effectively, but also they have often been written with a very different set of priorities than required for a VSCode extension.

Yes, I see. I am not very well known in Javascript/Typescript/Node/VSCode so I would need to investigate more, but it seems to me, that node-xml2js and FontoXPath are pretty powerful (but also pretty heavy). Maybe node-xml2js would cover enough functionality on its own, maybe something much more simple could be used. I would only need to parse the catalog files and resolve the URIs therein, as I see it currently. I understand, if you don't want to go that road and keep things more stringent and lean.

One alternative approach would be, for a specific project, to convert XML Catalogs to JSON files - probably using XSLT.

Yes, this sounds like a possibility. One, then, could add a task/NPM script to manually update this information. Since we are in a manual producer environment, this is totally feasible. The user will always know, when (s)he has updated the catalog and can apply any needed steps.

I took a similar approach for the XML Schema for XSLT 3.0 - converting it to a TS class. This is what provides the auto-completion information and will (eventually) allow full validation of the XSLT elements and attributes as they are being types.

Yes, I have seen it. I was already wondering, whether you created those long descriptions by hand... ;-).

pgfearo commented 4 years ago

I'm just beginning to remind myself of how the existing code works for resolving file references.

There are two main areas where file references are resolved:

The first thing I'll look at is a small amount of refactoring to decouple the resolving of file references more effectively. Currently I believe both methods use different function to resolve href attribute values so this needs to be fixed.

Following this, I'm thinking I could add functionality so that hrefs can also use be resolved using a JavaScript map. We're then just left with the (most difficult) question of how this map is generated.

To help me understand performance implications, in your dev environment, roughly how many XSLT files could be included in a set of XML Catalogs? At my workplace we have less than 50 XSLT files that may be imported/included.

bmix commented 4 years ago

Sorry for being late, I missed the update.

To help me understand performance implications, in your dev environment, roughly how many XSLT files could be included in a set of XML Catalogs? At my workplace we have less than 50 XSLT files that may be imported/included.

Currently I have 129 mappings in my main catalog, of which 16 are stylesheets but this is growing as I create more XSLT function- and template-libraries. I also tend to include external character entities. The majority, however, is Namespace -> Schema mappings and XQuery module resolution, which play no role here.

Following this, I'm thinking I could add functionality so that hrefs can also use be resolved using a JavaScript map. We're then just left with the (most difficult) question of how this map is generated.

Could you tell me more about this question? I assume, you may think about the mapping between key-name/value and what syntax to use? I would not know any way to introduce the key=iri and value=iri concept to Javascript. Though, it may be legal to have any character as keyname in a JS object literal, as long as quotes are surrounding it. This would allow IRIs to become key names.

EDIT: I just tested:

const testO = {
  'http://test.invalid/ns/cool1.0': 'result'
}

and it was valid JS.

pgfearo commented 3 years ago

I'm afraid it's been a while since I last looked at the issue - I'm quite busy on other features, but still keen to find a solution - even if it's a basic workaround to start with - using a similar lookup object to the code you show.