flowable / flowable-engine

A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.
https://www.flowable.org
Apache License 2.0
7.99k stars 2.62k forks source link

Save element name in extension element if it has any new lines #3946

Closed BertMatthys closed 3 months ago

BertMatthys commented 3 months ago

If an element name has any newlines, save it in an extension element instead of in an attribute because new lines are changed into spaces when xml attribute is parsed

tstephen commented 3 months ago

Hi, I'm not sure this is a good idea. New lines in XML should be encoded as (line feed) or possibly &#xD (line feed, carriage return) if you prefer.

filiphr commented 3 months ago

Thanks for the feedback @tstephen. We were looking into that option actually as well. We tried using 
 when generating the attribute. However, the problem is that the default XMLStreamWriter the com.sun.xml.internal.stream.writers.XMLStreamWriterImpl when writing the value of the attribute it is going to escape the characters we pass it to. For some reason they are not escaping the \n. However, if there is & then it gets escaped. Which means that we can't encode it on our side before passing it.

When writing it the generated XML looks like:

<sequenceFlow id="bpmnSequenceFlow_6" name="abc
def
ghi" sourceRef="bpmnTask_3" targetRef="bpmnEndEvent_5">

However, when this is read the new lines are getting removed (by the XMLStreamReader.

If you have some better suggestion for this, we are more than happy to look into it

tstephen commented 3 months ago

How extraordinary! And annoying!

Nonetheless I think this is a very visible obstacle to interchange, so I hope some solution can be found.

I think the problem stems from an underspecification of XMLStreamWriter. The Java tutorial states:

Note that XMLStreamWriter implementations are not required to perform well-formedness or validity checks on input.

Ideas I can think of:

None of these seem particularly attractive to be honest.

I am not sure what the new behaviour is but it apears that the standard name may not be written if it contains new line? (line 173 of BaseBpmnXMLConverter.java)? Perhaps the least disruptive would be to remove this if (!BpmnXMLUtil.containsNewLine(name)) so that the standard attribute is written in addition to the new one?

filiphr commented 3 months ago

Woodstox appears it may support more complete escaping

I'm not sure that it is worth adding another dependency for the purpose of this.

I don't know why the cursor Stax implementation was chosen but I note that this page (at the end) seems to be pushing towards the Iterator implementation because it is 'more extensible' and 'future-proof'.

I'm not sure when the initial XML parsing was added, but it's been in there for a really long time. The page you linked also says

If performance is your highest priority—for example, when creating low-level libraries or infrastructure—the cursor API is more efficient.

which makes the Cursor API more attractive.

I am not sure what the new behaviour is but it apears that the standard name may not be written if it contains new line? (line 173 of BaseBpmnXMLConverter.java)?

Indeed the standard name will not output the default attribute name if it contains a new line. However, it will output a custom extension element, which we then use to load it.

Why do you think that it is a problem if the default attribute name is not written down?

tstephen commented 3 months ago

Why do you think that it is a problem if the default attribute name is not written down?

Because it means that interchange between tools will lose the name. This is a slippery slope.

filiphr commented 3 months ago

Because it means that interchange between tools will lose the name. This is a slippery slope.

We are talking about an extreme edge case, right? This is when you do an explicit new line in the name. Let's take following scenarios into consideration:

Basically, the only thing that will not work is if you are using explicitly new lines in names using Flowable Java API and / or Flowable Design.

Do you have some other scenario in mind?

tstephen commented 3 months ago

I was thinking about your first scenario....

So if I give a name containing &#xA; and query the Java API I'll get exactly the same name back?

filiphr commented 3 months ago

So for the first scenario nothing changes. Depending on how an XML with an attribute name with &#xA; is getting parsed by the XML Parser we are using. If it is properly parsed as a new line then the Java API will return a new line.

The only thing that this affects is if you are using the Flowable BPMN XML Converter to convert from the Flowable BPMN Model Java API to XML.

In any case, we are going to add a flag in the converter to use or not use the new extension element when exporting. This way nothing should change. Keep in mind that 3rd party XML -> Flowable is not affected by this change anyways.

tstephen commented 3 months ago

Thanks for bearing with me Filip, very glad to hear this.

filiphr commented 2 months ago

@tstephen not sure if you saw #3951. We applied the feedback you provided.

tstephen commented 2 months ago

Yes I did @filiphr, though I've not had a chance to try it out, it looked good. Many thanks!