flowable / flowable-engine

A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.
https://www.flowable.org
Apache License 2.0
7.95k stars 2.62k forks source link

Text with xml entity references in <extensionElements> is truncated #3267

Open zhangleixp opened 2 years ago

zhangleixp commented 2 years ago

1. Description

Bpmn xml text like this:

<?xml version="1.0" encoding="UTF-8"?>
<definitions>
  <process id="..." name="...">
    <startEvent id="startevent1" name="Start"></startEvent>
    <userTask id="userTask1" name="userTask1">
        <extensionElements>
            <mytag>a&gt;b</mytag>  <!-- here is the point -->
        </extensionElements>
    </userTask>
  </process>
</definitions>

After convertion to json, the text a&gt;b in element is converted to b, while expection is: a>b.

2. Reason

2.1 Behavior of XMLStreamReader

The XMLStreamReader treate entity references specially. a&gt;b will report 3 times. I wrote a simple code:

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import java.io.StringReader;

public class Application {
    public static void main(String[] args){
        String xmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <extensionElements><mytag>a&gt;b</mytag></extensionElements>";
        XMLInputFactory xif = XMLInputFactory.newInstance();
        xif.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
        try {
            XMLStreamReader xtr = xif.createXMLStreamReader(new StringReader(xmlString));
            while (xtr.hasNext()) {
                xtr.next();
                if (xtr.isCharacters()) {
                    System.out.printf("eventType:%d, text:%s\n", xtr.getEventType(), xtr.getText());
                }
            }
        }catch (Exception ex){
            System.out.println(ex);
        }
    }
}

The output is:

eventType:4, text:a
eventType:4, text:>
eventType:4, text:b

We can see that,there is only a line a&gt;b, but XMLStreamReader reported 3 times.

2.2 BpmnXMLUtil.parseChildElements() only got the last reported text

Relatived code : https://github.com/flowable/flowable-engine/blob/d763b143d9b80b32ec8483fe6584429283e87e75/modules/flowable-bpmn-converter/src/main/java/org/flowable/bpmn/converter/util/BpmnXMLUtil.java#L190-L204

At line 195, extensionElement.setElementText(xtr.getText().trim()) only got the last reported text b.

3. Suggestion

Modify L195 as: extensionElement.setElementText(extensionElement.getElementText() + xtr.getText().trim())

Please forgive me for my poor English.

jo3do3 commented 1 year ago

Hello, I have also problem with this part of 2.2 BpmnXMLUtil.parseChildElements(). Potential solution may fix both. Problem noticed in 6.8.0.20 in xml model i have value like: <![CDATA[some text which xtr is not providing at once]] In while loop:

xtr.next() then xtr.getText() returns 'some text which xtr is not provi' which sets to extension element xtr.next() then xtr.getText() returns 'ding at once' which overrides extension element

So instead:

if (StringUtils.isNotEmpty(xtr.getText().trim())) {
    extensionElement.setElementText(xtr.getText().trim()); 
} 

I would prefer something like:

if (StringUtils.isNotEmpty(xtr.getText().trim())) {
   if(extensionElement.getElementText() == null){
       extensionElement.setElementText(xtr.getText());
  } else {
      extensionElement.setElementText(extensionElement.getElementText() + xtr.getText());
  }
} 

It would require not trimming or trimming on extension element return.

I noticed that my problem could be fixed by setting coalescing on XMLInputFactory at BpmnXMLConverter.convertToBpmnModel

if(xif.isPropertySupported(XMLInputFactory.IS_COALESCING)){
    xif.setProperty(XMLInputFactory.IS_COALESCING, true);
}

but I do not understand potential implications of this

zhangleixp commented 1 year ago

A year has passed, is there anyone who can help