ballerina-platform / ballerina-lang

The Ballerina Programming Language
https://ballerina.io/
Apache License 2.0
3.68k stars 752 forks source link

Ballerina automatically decoding XML content #39507

Closed vidurananayakkara closed 1 year ago

vidurananayakkara commented 1 year ago

Description:

Steps to reproduce: -- Send the below XML payload to a Ballerina service

<sequence
    xmlns="http://ws.apache.org/ns/synapse" name="documents_update" trace="disable">
    <!-- WSO2 GENERAL PROPERTIES -->
    <!-- When using the Loopback mediator, it is mandatory to set the following property -->
    <property name="api.ut.backendRequestTime" expression="get-property('SYSTEM_TIME')"/>
    <!-- Needed to use a proxy server -->
    <property name="POST_TO_URI" scope="axis2" type="STRING" value="true"/>
    <!-- SPECIFIC PROPERTIES -->
    <property name="documentId" expression="get-property('uri.var.documentId')"/>
    <!-- Load input fields into properties -->
    <script language="js">var log = mc.getServiceLog();
var payload = mc.getPayloadJSON();
log.info(JSON.stringify(payload));
for (i = 0; i &lt; payload.otherFields.length; ++i) {
mc.setProperty(payload.otherFields[i].id,payload.otherFields[i].value);
log.info(String(payload.otherFields[i].id));
}

</script>
    <!-- LOGON -->
    <sequence key="gov:apimgt/miles/general/v1.1.x/sequences/MWS_Logon.xml"/>
    <sequence key="gov:apimgt/miles/general/v1.1.x/sequences/MWSCheckErrors.xml"/>
    <filter source="$ctx:responseStatus" regex="error">
        <then>
            <loopback/>
        </then>
    </filter>
    <filter source="$ctx:country" regex="pt">
        <then>
            <sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/document_update_document_pt.xml"/>
        </then>
    </filter>
    <!--
<log description="Log"><property expression="$body/*" name="Document update:" /></log>
-->
    <sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/documents_read_byParameters.xml"/>
    <!--
<sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/document_read_documentContent.xml"/><sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/document_read_document.xml"/> -->
    <loopback/>
</sequence>

Upon receiving the payload from Ballerina, it is changed as below:

<sequence
    xmlns="http://ws.apache.org/ns/synapse" name="documents_update" trace="disable">
    <!-- WSO2 GENERAL PROPERTIES -->
    <!-- When using the Loopback mediator, it is mandatory to set the following property -->
    <property name="api.ut.backendRequestTime" expression="get-property('SYSTEM_TIME')"/>
    <!-- Needed to use a proxy server -->
    <property name="POST_TO_URI" scope="axis2" type="STRING" value="true"/>
    <!-- SPECIFIC PROPERTIES -->
    <property name="documentId" expression="get-property('uri.var.documentId')"/>
    <!-- Load input fields into properties -->
    <script language="js">var log = mc.getServiceLog();
var payload = mc.getPayloadJSON();
log.info(JSON.stringify(payload));
for (i = 0; i < payload.otherFields.length; ++i) {
mc.setProperty(payload.otherFields[i].id,payload.otherFields[i].value);
log.info(String(payload.otherFields[i].id));
}
    </script>
    <!-- LOGON -->
    <sequence key="gov:apimgt/miles/general/v1.1.x/sequences/MWS_Logon.xml"/>
    <sequence key="gov:apimgt/miles/general/v1.1.x/sequences/MWSCheckErrors.xml"/>
    <filter source="$ctx:responseStatus" regex="error">
        <then>
            <loopback/>
        </then>
    </filter>
    <filter source="$ctx:country" regex="pt">
        <then>
            <sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/document_update_document_pt.xml"/>
        </then>
    </filter>
    <!--
<log description="Log"><property expression="$body/*" name="Document update:" /></log>
-->
    <sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/documents_read_byParameters.xml"/>
    <!--
<sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/document_read_documentContent.xml"/><sequence key="gov:apimgt/miles/documentManagement/v1.2.x/sequences/documents/document_read_document.xml"/> -->
    <loopback/>
</sequence>

As you notice '\<' was changed with '<' character.

How do we avoid '&lt' being changed to '<'

MaryamZi commented 1 year ago

@chamil321, can you please share details about how this XML value is created/parsed? Do you use a method in io.ballerina.runtime.api.utils.XmlUtils?

xml:fromString which uses io.ballerina.runtime.internal.TypeConverter#stringToXml (which uses io.ballerina.runtime.api.utils.XmlUtils#parse(java.lang.String) internally) seems to handle this as expected.

Looping in @warunalakshitha also.

MaryamZi commented 1 year ago

I don't seem to be able to reproduce this btw. Tried with both a Ballerina HTTP client and cURL. Can you share more details including the Ballerina version also?

chamil321 commented 1 year ago

@chamil321, can you please share details about how this XML value is created/parsed? Do you use a method in io.ballerina.runtime.api.utils.XmlUtils?

Yes, we use the parse() method of XmlUtils to create the xml using the inputstream

Tested using Ballerina 2201.2.3 (Swan Lake Update 2) and not reproducible. Used the HTTP client to send the content to the service @vidurananayakkara Can you share your ballerina version and the client which the XML content was sent from?

vidurananayakkara commented 1 year ago

@chamil321 We are invoking the "ResourceAdminService" admin service of API Manager 2.6.0 via Ballerina. I will get back with the Ballerina version after getting that info from the client. However, could you try the above after saving the stated sequence "/_system/governance/soapin" of APIM 2.6.0?

vidurananayakkara commented 1 year ago

@chamil321 Please find the Ballerina information: jBallerina 1.2.13 Language specification 2020R1 Update Tool 0.8.10 Update Tool 1.3.3

warunalakshitha commented 1 year ago

We will check with 1.x version

vidurananayakkara commented 1 year ago

Please note the source code to reproduce the issue

import ballerina/io;

public function main() returns error? {

    string xmlFilePathIn = "soapin.xml";
    string xmlFilePathOut = "soapout.xml";

    xml messageXml = check io:fileReadXml(xmlFilePathIn);
    check io:fileWriteXml(xmlFilePathOut, messageXml);
}
warunalakshitha commented 1 year ago

I have tested the above example. I couldn't reproduce the issue since I got the &lt character without any change. BTW is this tested in 1.2.13 since 1.2.x does not have io:fileReadXml method?

hasithaa commented 1 year ago

After analyzing the actual use case, we found the actual issue is with &gt; being replaced with >.

The jBallerina implementation uses woodstox as the XML parser and this issue was discussed in this issue. Based on that is given behavior is spec compliant. I verified this in XML spec as well. It says,

"The right angle bracket (>) may be represented using the string " > ", and must, for compatibility, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section."

So generated XML is compliant with the XML spec. See the following program.

import ballerina/io;

public function main() {
    xml a = xml `<b><a title="&lt; &gt;">&lt; &gt;</a><c><![CDATA[ &lt; &gt; ]]></c></b>`;
    xml b = xml `<b><a title="&lt; >">&lt; ></a><c> &lt; > </c></b>`;
    io:println(a == b); // true
}

Hence closing this issue.

github-actions[bot] commented 1 year ago

This issue is NOT closed with a proper Reason/ label. Make sure to add proper reason label before closing. Please add or leave a comment with the proper reason label now.

      - Reason/EngineeringMistake - The issue occurred due to a mistake made in the past.
      - Reason/Regression - The issue has introduced a regression.
      - Reason/MultipleComponentInteraction - Issue occured due to interactions in multiple components.
      - Reason/Complex - Issue occurred due to complex scenario.
      - Reason/Invalid - Issue is invalid.
      - Reason/Other - None of the above cases.

warunalakshitha commented 1 year ago

There is issue with double unescaping when we read xml file with io:fileReadXml. This can be reproducible if you read xml file as below.

<data>&lt;xsl:apply-templates someattrib=" &amp;gt;" /></data>

Out put

<data>&lt;xsl:apply-templates someattrib=" >" /></data>

But should be

<data>&lt;xsl:apply-templates someattrib=" &amp;gt;" /></data>