Open derek63 opened 9 years ago
Sorta maybe related question: is EMA coupled to XML messages specifically anywhere? I assumed we were using XPATHs here and were thus somewhat coupled to XML. It sounds like we're not though, and you're keeping things open for other data formats, which is great. Is there any where currently where we expect XML?
The answer to that question is mostly no. I think the usage of RegEx (vs XML parsing) here was to make that true, however things like comments or formatting characters can make that not work as expected (hence the PRs referenced above and this enhancement request). I wrote/appropriated/included some ugly JavaScript a few weeks ago that tries to pretty print XML (See https://github.com/esbtools/esb-message-admin/pull/108), but expected to replace that very soon with something akin to this that detects the content type and formats accordingly (and doesn't try to format content it can't easily parse). See https://github.com/esbtools/esb-message-admin/issues/109 for more info.
Awesome, thanks for info!
https://github.com/esbtools/esb-message-admin/issues/121 highlighted an issue where formatting characters in an XML document cause sensitive information to not be extracted.
A temporary fix was added in https://github.com/esbtools/esb-message-admin/pull/122, however we probably shouldn't be using RegEx to parse/update XML documents.
We should instead try to identify if format of the payload can be easily parsed (XML, JSON, etc.), and if so use an established parser to find the sections that need to be scrubbed and update them. RegEx should be a last resort if the payload is not a structured/parseable format and brute force scrub the data.