gchq / stroom

Stroom is a highly scalable data storage, processing and analysis platform.
https://gchq.github.io/stroom-docs/
Apache License 2.0
435 stars 53 forks source link

Using (?x) inline regex modifier in DataSplitter doesn't support comments properly #512

Open stroomdev10 opened 6 years ago

stroomdev10 commented 6 years ago

See example Data Splitter below Intended action is to match 10 chars then match 5 chars then ignore rest of the line The first inline comments causes the regex to be truncated

<dataSplitter xmlns="data-splitter:3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="data-splitter:3 file://data-splitter-v3.0.xsd" version="3.0">
  <split delimiter="\n">
    <group>
      <regex pattern="(?x)
^(.{10})
# Stops the regex here - remove this line to fix it
(.{5})
 .+$
# Comment here is fine
 ">
        <data value="$1"></data>
        <data value="$2"></data>
      </regex>
    </group>
  </split>
</dataSplitter>
stroomdev66 commented 6 years ago

This issue is due to XML annotations having new lines normalised to a single space as per the XML spec. In order to use an attribute for pattern it would be necessary to add &#10; (a new line entity) at the start of each line. Perhaps we should consider using an element for pattern instead of an attribute or use token substitution.

at055612 commented 8 months ago

@stroomdev10 can you close and optionally raise an enhancement to support pattern in element text.