bluscreenofjeff / bluscreenofjeff.github.io

My information security blog
https://bluescreenofjeff.com
BSD 3-Clause "New" or "Revised" License
7 stars 2 forks source link

2016-10-14-black-magic-parsing-with-regular-expressions-parsing-for-pentesters #16

Open bluscreenofjeff opened 7 years ago

bluscreenofjeff commented 7 years ago

Comments on Black Magic Parsing with Regular Expressions - Parsing for Pentesters

cacipeggy commented 6 years ago

Hi Jeff. I apologize for taking your time for a question that is virtually irrelevant to your post. Please ignore me if this is irritating to you in any way. I'm grappling with an issue that I have no direct control over, but I am certain has to do with the way the rules are written in our DLP system. The administrators won't even give me a read-only view of the rules they have configured. I'm attempting to understand enough to explain to those in control what they could do to help. You might save me from looking like an idiot and being ignored even more. We are getting an insane number of false positives for social security numbers because the net has been cast too widely, so to speak. We get matches on ZIP+4, for example (nnnnn-nnnn). Can't they specify something like: When there is a string of 9 digits that does not match the standard format (nnn-nn-nnnn), match only if preceded by "ssn" or ''number" or "#". Can the notion of "preceded by" be limited? For example, to within the same line? I appreciate you for even reading this. Best wishes, Peggy

bluscreenofjeff commented 6 years ago

Hi Peggy! Sorry to hear about your regex troubles. You definitely aren't alone in having false positives in DLP. Just to make sure I'm understanding correctly, you would like to know if you could only match on the following:

and ignore anything that only matches 123-45-6789. Is that right?

If so, the following example will only match SSNs preceded by "ssn", "number", or "#" (with an optional space after it) and with optional hyphens:

user@host:~# cat input.txt 
ssn123456789
123456789
123-45-6789
ssn123-45-6789
ssn 123-45-6789
number123456789
#123456789
123456789
user@host:~# grep -o -E '(ssn|number|#)\ ?[0-9]{3}-?[0-9]{2}-?[0-9]{4}' input.txt 
ssn123456789
ssn123-45-6789
ssn 123-45-6789
number123456789
#123456789
cacipeggy commented 6 years ago

Thank you so much for responding, Jeff. I'm going to reflect on what you've sent so that I can answer your follow-up questions coherently. I will also give you a couple of truly wacky FPs (my affectionate nickname for the little buggers) from our system. I wanted to reply just as soon as I saw your email, because I truly appreciate it the help, but it's the end of a long day for me, and I'm too tired to think. Best wishes, Pggy

Edit: remove previous comments from this message