DesignLiquido / xslt-processor

A JavaScript XSLT processor without native library dependencies
GNU Lesser General Public License v3.0
104 stars 32 forks source link

bug - invalid text in the front of the transform output #88

Closed jasonliao-cb closed 8 months ago

jasonliao-cb commented 8 months ago

Hi, the package was working fine but after we upgraded it from 2.1.4 to 2.2.0, then the output had some wired 'NoYes' text at the beginning of the output. steps to reproduce:

const fs = require('fs');
const xsltProcessor = require('xslt-processor');

// Load the XML source and XSLT stylesheet
const xmlSource = fs.readFileSync('source.xml', 'utf-8');
const xsltStylesheet = fs.readFileSync('stylesheet.xsl', 'utf-8');

// Create an XSLT processor
const xslt = new xsltProcessor.Xslt();

// Load the stylesheet
const xmlParser = new xsltProcessor.XmlParser();

// Perform the transformation
const outXmlString = xslt.xsltProcess(
    xmlParser.xmlParse(xmlSource),
    xmlParser.xmlParse(xsltStylesheet)
);

// Save the transformed result to a destination file
fs.writeFileSync('destination.xml', outXmlString.toString());
console.log('XSLT transformation successful! Output saved to destination.xml');

before 2.2.0, the output is correct from 2.2.0, there is some additional text YesNo at the front of the output.

can you investigate what has been injected in version 2.2.0?

jasonliao-cb commented 8 months ago

@leonelsanchesdasilva

jasonliao-cb commented 8 months ago

test code:

const xmlSource = `<?xml version="1.0" encoding="UTF-8"?>
<products>
    <product>
        <product_id>ABC</product_id>
    </product>
    <product>
        <product_id>ABB</product_id>
    </product>
</products>`;

        const xsltSource = `<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/products">
        <products>
            <xsl:for-each select="product">
                    <product>
                        <xsl:choose>
                            <xsl:when test="product_id = 'ABB'">
                                <xsl:text>Yes</xsl:text>
                            </xsl:when>
                            <xsl:otherwise>
                                <xsl:text>No</xsl:text>
                            </xsl:otherwise>
                        </xsl:choose>
                    </product>
            </xsl:for-each>
        </products>
    </xsl:template>
</xsl:stylesheet>`;

        const xsltClass = new Xslt();
        const xmlParser = new XmlParser();
        const xml = xmlParser.xmlParse(xmlSource);
        const xslt = xmlParser.xmlParse(xsltSource);
        const output = xsltClass.xsltProcess(xml, xslt);

seems like the xsl:choose is having issue now.

leonelsanchesdasilva commented 8 months ago

Hi @jasonliao-cb 👋

In version 2.2.0, attributes now are treated as regular nodes. Until version 2.1.4, attributes and child nodes were separated.

I won't have time to take a look at this and fix until the end of March. Not sure if you can wait until then. Otherwise, if you're able to fix it, I can review and publish a new version.

jasonliao-cb commented 8 months ago

@leonelsanchesdasilva I revisited the earlier PRs, it's not the attribute change but the output. https://github.com/DesignLiquido/xslt-processor/pull/85/commits if remove the outputDocument from the line this.xsltProcessContext(expressionContext, stylesheet, this.outputDocument); it works, but simply remove it would cause the test failures. would you mind giving some more details or revert this change?

leonelsanchesdasilva commented 8 months ago

@jasonliao-cb It's not that simple. Some transformations produce a fragment that it is passed as the argument (for instance, https://github.com/jasonliao-cb/xslt-processor/blob/345256be093fb547ffb5b016957922722057e24f/src/xslt/xslt.ts#L237).

I would debug xsltWhen to find out why it is writing both values instead of one.

xqin commented 8 months ago

same situation( but i use <xsl:copy-of).

xslt-processor@v2.3.0

POC:

const { Xslt, XmlParser } = require('xslt-processor')

const xmlString = `
<?xml version="1.0" encoding="UTF-8"?>
<test>
  <A>
    <B>
      <D><![CDATA[Hello]]></D>
    </B>
    <C>
      <D><![CDATA[World]]></D>
    </C>
  </A>
</test>
`
const xsltString = `
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="html" version="4.0" encoding="utf-8" omit-xml-declaration="yes" />
    <xsl:template match="/test">
        <xsl:variable name="varA" select="A/B/D" />
        <xsl:variable name="varB" select="A/C/D" />
        <h1>
            <xsl:copy-of select="$varA" />-<xsl:copy-of select="$varB" />
        </h1>
    </xsl:template>
</xsl:stylesheet>
`

const xslt = new Xslt()

const xmlParser = new XmlParser()

const outXmlString = xslt.xsltProcess(
  xmlParser.xmlParse(xmlString),
  xmlParser.xmlParse(xsltString)
)

console.log(outXmlString)

// <D><![CDATA[Hello]]></D><D><![CDATA[World]]></D><h1>-</h1>
jasonliao-cb commented 8 months ago

@jasonliao-cb It's not that simple. Some transformations produce a fragment that it is passed as the argument (for instance, https://github.com/jasonliao-cb/xslt-processor/blob/345256be093fb547ffb5b016957922722057e24f/src/xslt/xslt.ts#L237).

I would debug xsltWhen to find out why it is writing both values instead of one.

I'm trying to debug the issue, and text value node was append to the transformed child at this line https://github.com/DesignLiquido/xslt-processor/blob/f0a90e655dc04deb0e10c5c8c24806ca22ebb3d3/src/xslt/xslt.ts#L454

I'm not sure what is the usage of the node.escape, but it seems at the line https://github.com/DesignLiquido/xslt-processor/blob/f0a90e655dc04deb0e10c5c8c24806ca22ebb3d3/src/dom/xml-functions.ts#L227-L228 it has three childNodes and the two text nodes were also sent to the output.

leonelsanchesdasilva commented 8 months ago

@jasonliao-cb @xqin Thanks for your patience. I finally had some time to sit and debug this issue.

By @jasonliao-cb's example, the expected output should be:

<products><product>No</product><product>Yes</product></products>

But now it is:

NoYes<products><product></product><product></product></products>

This happened due to a modification on version 2.3.0, that now treats attributes as common nodes, as I described previously. For that, I had to bring back one logic I was kind of abandoning, but it started to make much more sense when the nodes were unified: it is the output parameter.

The output parameter exists in all the xslt internal methods, and it tells the parser what is the output node. For instance, in xsltChoose method:

    protected xsltChoose(context: ExprContext, template: XNode, output: any) { // In fact `output` is `XNode` type
        for (const childNode of template.childNodes) {
            if (childNode.nodeType !== DOM_ELEMENT_NODE) {
                continue;
            }

            if (this.isXsltElement(childNode, 'when')) {
                const test = xmlGetAttribute(childNode, 'test');
                if (this.xPath.xPathEval(test, context).booleanValue()) {
                    this.xsltChildNodes(context, childNode, output);
                    break;
                }
            } else if (this.isXsltElement(childNode, 'otherwise')) {
                this.xsltChildNodes(context, childNode, output);
                break;
            }
        }
    }

If we look at xsltText implementation:

                case 'text':
                    text = xmlValue(template);
                    node = domCreateTransformedTextNode(this.outputDocument, text);
                    const disableOutputEscaping = template.childNodes.filter(
                        (a) => a.nodeType === DOM_ATTRIBUTE_NODE && a.nodeName === 'disable-output-escaping'
                    );
                    if (disableOutputEscaping.length > 0 && disableOutputEscaping[0].nodeValue === 'yes') {
                        node.escape = false;
                    }
                    const destinationTextNode = output || context.outputNodeList[context.outputPosition];
                    destinationTextNode.appendTransformedChild(node);
                    break;

We need to determine a destination text node. Previously, when the output was not that important, this logic would suffice:

const destinationTextNode = output || context.outputNodeList[context.outputPosition];

However, by default, xsltChoose passes output as the parent output node. That's the point I'll fix in my next PR.

Here's the refactored xsltChoose method:

    /**
     * Implements `xsl:choose`, its child nodes `xsl:when`, and
     * `xsl:otherwise`.
     * @param context The Expression Context.
     * @param template The template.
     * @param output The output. Only used if there's no corresponding output node already defined.
     */
    protected xsltChoose(context: ExprContext, template: XNode, output: XNode) {
        for (const childNode of template.childNodes) {
            if (childNode.nodeType !== DOM_ELEMENT_NODE) {
                continue;
            }

            if (this.isXsltElement(childNode, 'when')) {
                const test = xmlGetAttribute(childNode, 'test');
                if (this.xPath.xPathEval(test, context).booleanValue()) {
                    const outputNode = context.outputNodeList[context.outputPosition] || output;
                    this.xsltChildNodes(context, childNode, outputNode);
                    break;
                }
            } else if (this.isXsltElement(childNode, 'otherwise')) {
                const outputNode = context.outputNodeList[context.outputPosition] || output;
                this.xsltChildNodes(context, childNode, outputNode);
                break;
            }
        }
    }
leonelsanchesdasilva commented 8 months ago

@xqin Now, for your example, the expected output is:

<h1><D>Hello</D>-<D>World</D></h1>

But we are having:

<D><![CDATA[Hello]]></D><D><![CDATA[World]]></D><h1>-</h1>

Not exactly the same issue, but I'll solve in this issue too, since they are somewhat related.

For copy-of, once again, output takes precedence over the actual output node:

                case 'copy-of':
                    select = xmlGetAttribute(template, 'select');
                    value = this.xPath.xPathEval(select, context);
                    const destinationNode = output || context.outputNodeList[context.outputPosition];
                    if (value.type === 'node-set') {
                        nodes = value.nodeSetValue();
                        for (let i = 0; i < nodes.length; ++i) {
                            this.xsltCopyOf(destinationNode, nodes[i]);
                        }
                    } else {
                        let node = domCreateTextNode(this.outputDocument, value.stringValue());
                        domAppendChild(destinationNode, node);
                    }
                    break;

In other words, this:

const destinationNode = output || context.outputNodeList[context.outputPosition];

Now should be this:

const destinationNode = context.outputNodeList[context.outputPosition] || output;

There's just one problem left: how to resolve the CDATA. There's nothing special with it, except that we always escape XML syntax in it. The parser then creates a CDATA node, to be resolved by our XML Output class.

In fact the parameter already existed, but I didn't give attention enough in our documentation (I apologize, but sometimes maintaining this library can be quite exhausting). I'm adding the corresponding documentation elements, and making options.cData as true by default.