Match alias's are being 'found' during sanitation stage

cavejay / Strippy

Use this Powershell Script to sanitise your logs of configured patterns before handing them off to someone else (like your support team)

MIT License

2 stars 3 forks source link

Match alias's are being 'found' during sanitation stage #31

Open cavejay opened 6 years ago

cavejay commented 6 years ago

Currently if you have a rule that replaces a string with 'abcdefg' and another that replaces 'cde' with 'memes' you'll could end up with abmemes123fg4 which is obviously an undesirable outcome.

in order to prevent this patterning the 'cde' -> 'memes' sanitisation should occur before the 'xx' -> 'abcdefg' sanitisation.

To resolve this bug please either:

add a warning to the config file about this behaviour with steps prevent it (order the rules in the config file to avoid this behaviour)
automatically resolve the ordering of sanitisation at run time using magic (unwritten code)
add a config entry for sanitisation ordering

cavejay commented 6 years ago

On second thoughts I don't think this is a straight forward fix. Matches are dynamically found and replaced with keys based on length. In order to do something other than this the sanitation stage would need to be more intelligent and 'lock' keys once they'd been switched in for a match.

cavejay commented 6 years ago

In the end this bug doesn't stop the files from being cleaned, it just creates an ugly output.

cavejay commented 6 years ago

This bug is also the cause behind the script recursively replacing some rules. When this occurs sanitising may never complete and the resulting files are much larger than the originals.

Example of rule:

"<some regex>"=CleverWellThoughtOutName
"<some regex that matches 'WellThought'"=AnotherCleverWellThoughtOutName

This is something that's quite hard to predict as it requires us to check if the second rule might ever match a replacement string. It is something we could do after collecting all the keys and then warn the user and/or exit out to prevent loss of time.

cavejay commented 4 years ago

Closing #43 should prevent this happening for number-based overlaps.