DesignLiquido / xslt-processor

A JavaScript XSLT processor without native library dependencies
GNU Lesser General Public License v3.0
94 stars 30 forks source link

Multiple match bug #62

Closed leonelsanchesdasilva closed 12 months ago

leonelsanchesdasilva commented 12 months ago

Some thoughts

This:

<xsl:template match="*|/*">

Makes the processor visit all the tags recursively. This was setting original-name in all tags. This PR clears all the previous transformed attributes when no attributes are set in the next template.

github-actions[bot] commented 12 months ago

Coverage report

St.:grey_question:
Category Percentage Covered / Total
🟡 Statements
70.85% (+0.93% 🔼)
1383/1952
🟡 Branches
63.55% (+0.64% 🔼)
619/974
🟡 Functions
70.18% (+0.7% 🔼)
200/285
🟡 Lines
71.02% (+0.94% 🔼)
1294/1822

Test suite run success

48 tests passing in 13 suites.

Report generated by 🧪jest coverage report action from 07be0e011bae136a3c2053f3e5568f342c454972

marcobalestra commented 12 months ago
  • Resolves Multiple match #49 ... This PR clears all the previous transformed attributes when no attributes are set in the next template.

Not sure that this was the issue... The issue was the multiple match, in case you keep matching and processing multiple templates and then cleanup some side effects, the bug is still there.

By reading the solution description (“clears all the previous transformed attributes when no attributes are set in the next template”) I was able to slightly modify my original example and re-create the bug.

Same XML for all examples:

<root>
    <typeA />  
    <typeB />  
</root>

Example 1 - XSL

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes" />

    <xsl:template match="*|/*">
        <outputUnknown original-name="{name(.)}">
            <subnode>Custom text</subnode>
            <xsl:apply-templates select="*" />
        </outputUnknown>
    </xsl:template>

    <xsl:template match="typeA">
        <outputA>
            <yep />
        </outputA>
    </xsl:template>

    <xsl:template match="/*/typeB">
        <outputB foo="bar">I have text!</outputB>
    </xsl:template>

</xsl:stylesheet>

Expected output:

<outputUnknown original-name="root">
    <subnode>Custom text</subnode>
    <outputA>
        <yep />
    </outputA>
    <outputB foo="bar">I have text!</outputB>
</outputUnknown>

Actual output:

Custom text<subnode>Custom text</subnode>

Example 2 - XSL

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes" />

    <xsl:template match="*|/*">
        <outputUnknown original-name="{name(.)}">
            <xsl:apply-templates select="*" />
        </outputUnknown>
    </xsl:template>

    <xsl:template match="typeA">
        <outputA>
            <yep />
        </outputA>
    </xsl:template>

    <xsl:template match="/*/typeB">
        <outputB foo="bar">I have text!</outputB>
    </xsl:template>

</xsl:stylesheet>

Expected output:

<outputUnknown original-name="root">
    <outputA>
        <yep />
    </outputA>
    <outputB foo="bar">I have text!</outputB>
</outputUnknown>

Actual output:

<outputUnknown original-name="root">
    <yep/>
    <outputB original-name="typeB" foo="bar">I have text!</outputB>
</outputUnknown>

Example 3 - XSL

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes" />

    <xsl:template match="*|/*">
        <outputUnknown original-name="{name(.)}">
            <xsl:apply-templates select="*" />
        </outputUnknown>
    </xsl:template>

    <xsl:template match="typeA">
        <outputA />
    </xsl:template>

    <xsl:template match="/*/typeB">
        <outputB foo="bar">I have text!</outputB>
    </xsl:template>

</xsl:stylesheet>

Expected output:

<outputUnknown original-name="root">
    <outputA />
    <outputB foo="bar">I have text!</outputB>
</outputUnknown>

Actual output:

<outputUnknown original-name="root">
    <outputA/>
    <outputB original-name="typeB" foo="bar">I have text!</outputB>
</outputUnknown>

Some thoughts

This:

<xsl:template match="*|/*">

Makes the processor visit all the tags recursively. This was setting original-name in all tags.

But this is not the supposed XSLT behaviour, AFAIK….

<xsl:template match="*|/*">
    <foo />
</xsl:template>

is absolutely equivalent to:

<xsl:template match="/*">
    <foo />
</xsl:template>
<xsl:template match="*">
    <foo />
</xsl:template>

It should match only once, only on the best matching pattern. In the 3 above examples replacing <xsl:template match="*|/*"> with <xsl:template match="*"> is expected to produce the same exact expected result, while this doesn't happen.

My considerations

Actually it seems there is the same bug behind this "recursively” and the 3 above examples, and this is the reason I initially named it "multiple matches".

When a node faces its transformation (triggered by an “apply-templates” or by transformation kickstart) it is expected to match only once, only on the best matching template.

<xsl:template match="*|/*"> is a template matching on 2 different X-Path expressions:

In our example XML the transformation starts on document node, /.

There's not template matching "/", therefore / is expected to match the implicit XSLT default:

<xsl:apply-templates />

This triggers the transformation of /root, the documentElement node (as well of #text and #comment nodes siblings of /root, if any).

Considering the available matches we have:

  1. "*|/*" best match on "/*" ,lower on "*" - only this will process the node
  2. implicit default - lowest match
  3. "typeA" no match
  4. "/*/typeB" no match

Matches only on best match, where the instruction <xsl:apply-templates select="*" /> triggers two independent transforming processes for /root/typeA and /root/typeB. "*" triggers the transformation of node elements only, doesn't transform attribute nodes (@*) nor text nodes (text())

For /root/typeA:

  1. "typeA" best match - only this will process the node
  2. "*|/*" lower match (for "*" only)
  3. implicit default - lowest match
  4. "/*/typeB" no match

For /root/typeB:

  1. "/*/typeB" best match - only this will process the node
  2. "*|/*" lower match (for "*" only)
  3. implicit default - lowest match
  4. "typeA" no match

Closure

You wrote:

clears all the previous transformed attributes when no attributes are set in the next template

That's the point: there shouldn't be any "next template", never ever.

Only one template, only the best matching template, will transform a node. It's not just matter of syntax, it's semantic. The matching mechanism is behind XSLT declarative nature and part of its behaviour and efficiency.

leonelsanchesdasilva commented 12 months ago

Hi @marcobalestra. Appreciate you putting a lot of useful information here.

The actual design is simply recursive and it doesn't have a "best match" algorithm yet. I'll have to think about how to solve this. If you have the time and the disposition, PRs are welcome.

I should work on the other issues now and return to this problem afterwards, since I don't have a good way to solve them. If you have, please let me know.

marcobalestra commented 11 months ago

Hi @marcobalestra. Appreciate you putting a lot of useful information here.

The actual design is simply recursive and it doesn't have a "best match" algorithm yet. I'll have to think about how to solve this. If you have the time and the disposition, PRs are welcome.

I should work on the other issues now and return to this problem afterwards, since I don't have a good way to solve them. If you have, please let me know.

Dear @leonelsanchesdasilva , I know and understand that this design started before that the XSLTProcessor was available. On the other hand, considering that it's now in place, I think it could be time for a major version change that exploits it. Not sure it make much sense to rewrite in pure JS a native and optimised feature.

leonelsanchesdasilva commented 11 months ago

Hi @marcobalestra,

Yes, version 1.2.0 onward should have the fix already. I'm just studying the problem and expected behavior better before implementing. I don't think it's necessary a complete rewrite. The XPath sources are robust, the XML mechanism is simple and easy to understand. There's a lot of value in what exists.

This project will remain in TypeScript for a variety of reasons:

marcobalestra commented 11 months ago

Hi @leonelsanchesdasilva , sorry for being unclear.

When I wrote "pure js" this was including typescript, which is actually compiled and executed as javascript. It wasn't typescript vs js, it was rather a ts/js approach vs a builtin optimised object, the XSLTProcessor. But I see now that despite of the fact that it's part of browsers it's not part of Node.js - sorry.

leonelsanchesdasilva commented 11 months ago

Hi @marcobalestra. Your insights are being used in this branch: https://github.com/DesignLiquido/xslt-processor/tree/revamping-expr-context

That should be the base for version 2.0, which I believe it'll solve all the three examples.