Closed lifeforms closed 8 years ago
@lifeforms, this is one hell of a post! I took a few days to let it sink in. Sorry for the long delay.
But, I return with a clear and simple resolution: KISS
We group the function names into multiple data files according to group / paranoia level.
Group I: ignore as your suggestion (-> no data file) Group II: separate data file, rule on paranoia level 3 (or 2?) Group III: separate data file, rule on paranoia level 2 (or 1?), special chained rule to suppress FPs Group IV: separate data file, rule on paranoia level 1
The special chained rule could work as follows (tested): SecRule ARGS|ARGS_NAMES|REQUEST_HEADERS "@pmFromFile php-function-names-group-3.data" \ "phase:2,capture,chain,id:10002,deny,msg:'Blacklist hit %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'" SecRule MATCHED_VARS "@contains ("
This allows us to build 3 rules in a very similar fashion (-> strict siblings). We can go with the fast @pmf operator and avoid complex regexes. The chained rule is straight forward. If the @contains operator does not weed out enough FPs, then we can fall back on a variant of the rule you are proposing. I just think that it is worthwhile to test with @pmf first, as it is more readable and all function names are listed next to each other in 3 data files.
Personally, I think everything should run with action:block and with anomaly score 5.
I'm adding some data to this discussion. While i think we should also concern ourselves with false positives. I calculated the levenschtien distance versus the normal dictionary. Based on these i highlighted functions that had present words in the dict with a LD score of 0,1 or 2. Words with a LD of zero represent dictionary words - with a score of 1 present a HIGH chance of false positives. Words with a two probably a mild chance. Words without any LD score represent a low chance of false positive.
First col is function name, second is words from dict with LD score of 0, etc.
Attached is just the values with an LD of 0 - will very likely cause a FP as they are english words. I will continue to split these into our two groups forthcoming.
I sorted my list of HighFP elements from todays table - In general there MAY be more when we look at higher LD values (esp with the shorter words if we use @pm highFP.csv.txt
@dune73, sorry for taking so much time to respond.
I appreciate that you are trying to keep the rules as simple as possible, although I'm not fully agreeing with the outcome yet, maybe we can find a compromise! :)
sleep
, time
, abs
: We both agree to ignore these functions because of FP risk, so unless @csanders-git thinks otherwise, we'll consider that settled.strpos
, setlocale
: I proposed anomaly severity at PL1 and a critical sibling at PL2, you prefer critical severity at PL2 or 3. The distinction between those two can be subtle. It's hard for me to say which is more useful. I thought this group would be nice as a signal in PL1. But if we let the severity depend on paranoia level, we introduce complexity, and I appreciate that you want to reduce this, so I agree to use a single rule. As for the paranoia level 2 or 3: we will curate Group II to have low FP anyway. I estimate that this rule will definitely have fewer FP than some of the XSS rules we brought back in PL2. So, let's pick PL2. So, I propose to take one of your options: Make a single rule, at critical severity, at PL2, using @pmf.system
, exec
, eval
: I proposed critical severity at PL1 with a regexp to find adjoining (
character. You proposed PL2 with a chained rule. As someone who protects PHP apps most of the time, I think this rule is highly necessary and should be enabled by default. A similar in-house rule gives me lots of matches from botnets attacking Wordpress plugins, and has never given me any FP. In my opinion, the lack of FP is almost certainly due to the regexp. If we'd just use a chained rule to check for (
at any location in a matched variable, it will almost certainly raise FP, since the search terms themselves are - by design - very common - but the same goes for the use of parentheses. It's easy to think of legitimate plaintext containing sequence (
as well as chr
or eval
somewhere. So, I am against the chained check, as I predict that people would hate the rule, or worse, disable it, while it could be very useful. I also think the complexity of the regexp is not too onerous for such a useful rule. My experience in production with it has been positive, and we'll document it with tests. So, I propose to reconsider having this critical rule at PL1, using a regexp and not a chained rule.gzinflate
, base64_decode
: I proposed critical severity at PL1 with a regexp. You proposed the same but using @pmf. I agree with you; the regexp has little advantage here, so let's use @pmf. So I copy your proposal: Run at critical severity, at PL1, using @pmf.If you think my Group III regexp proposal would be complex too maintain, I think the added tests will make it pretty easy to see how it works. Here are some example tests used to develop my group III regexp:
- test: [{url: '/', code: 200}]
- test: [{url: '/?foo=system%28', code: 403}]
- test: [{url: '/?foo=System%28', code: 403}]
- test: [{url: '/?foo=system%0D%28', code: 403}]
- test: [{url: '/?foo=system%0A%28', code: 403}]
- test: [{url: '/?foo=system%0D%0A%28', code: 403}]
- test: [{url: '/?foo=system%20%28', code: 403}]
- test: [{url: '/?foo=system%20%20%28', code: 403}]
- test: [{url: '/?foo=%40system%28', code: 403}] # @system(
- test: [{url: '/?foo=system%09%28', code: 403}] # system\t(
- test: [{url: '/?foo=system%2F%2Fcomment%0A%20%28', code: 403}] # system//comment\n (
- test: [{url: '/?foo=system%20%23%23comment%0A%20%28', code: 403}] # system #comment\n (
- test: [{url: '/?foo=system%20%23%23%20%28', code: 403}] # system #\n (
- test: [{url: '/?foo=system%2F%2Acomment%2A%2F%28', code: 403}] # system/*comment*/(
- test: [{url: '/?foo=system%20%2F%2Acomment%2A%2F%20%28', code: 403}] # system /*comment*/ (
- test: [{url: '/?foo=system%20%09%2F%2A%2A%2F%09%20%28', code: 403}] # system \t/**/\t (
- test: [{url: '/?foo=system%20%09%2F%2Amulti%0D%0Aline%2A%2F%09%20%28', code: 403}] # system \t/*multi\r\nline*/\t (
# prevent false positives:
- test: [{url: '/?foo=eval', code: 200}]
- test: [{url: '/?foo=cheval', code: 200}]
- test: [{url: '/?foo=the%20system', code: 200}]
- test: [{url: '/?foo=ecosystem%28', code: 200}]
- test: [{url: '/?foo=systems%28', code: 200}]
- test: [{url: '/?foo=system%20something%28', code: 200}]
@csanders-git These statistics are very interesting. I am a computer linguistics n00b, but I want to learn this. The data you provide look like the very thing to define the groups.
@lifeforms Thank you for taking the time to explain things and sorting out the differences. Yes, I am sure we can find a compromise. We are very close, actually.
So we consider group I, group II and group IV as settled. Hope @csanders-git chimes in.
I agree to your arguments to run this in PL1. Makes sense and your experience underlines, that there are going to be few FPs. Few FPs if we go with the regex and not with the "@pmf + chained @contains (". I am glad you shed some light on this. I feared that the @contains was no equivalent of the regex. But please give me a moment to come up with a pmf+chain variant that equals the regex variant. I really want to try to be able to work with three files in the same format. If it does not work, then at least we tried.
OK, I tried this.
If you do @pmf and then capture TX.0, you can save the file entry that matched a temp variable together with "(" as the new pattern:
SecRule ARGS "@pmf /tmp/terms-group-3" "id:1006,phase:2,chain,pass,log,capture,setvar:TX.pattern=%{TX.0}("
This pattern can then be used as the pattern in the chained rule and that's almost a win. BUT unfortunately, evasion is easy, as TX.0 will only be filled once and filled with the first occurrence of a pattern in the group-file.
Consider:
foo=bar+define+and+eval(
TX.0 will now be "define" and "eval(" will fly under the radar. So as far as I can see this is impossible with ModSec2 and we need to wait for ModSec3. Hopefully.
Accessing the same parameter in the chained rule as in the initial rule is a similar problem. ModSec tends to overwrite MATCHED_VAR and even MATCHED_VARS and especially in connection with @pmf.
This brings us back to your regex variant, @lifeforms:
Can we work with a groupfile and inside the group file we have a comment with a recipe which can be used to build the regex on the base of the groupfile? The rules file with the regex would then point to the groupfile with a comment which explains how to build the regex and that the master copy of the functions resides in the group file, even if the group file is not referenced directly from the rules file. (Is this explanation understandable at all?)
@dune73 Yeah, I came to the same conclusion with the data file vs. the regexp. It can't be done with our current engine. Okay, seems we have consensus about the approach!
I like your idea about storing the source of regexps in original format. This particular regexp is probably not so complex and has few terms, but it might be interesting to store a source list that can be used to rebuild the regexp. To build regexps, the CRS has used the Regexp::Assemble perl module described in Optimizing regular expressions. The perl module works and is still actively maintained.
# sudo cpan install Regexp::Assemble
# cat assemble.pl
use strict;
use Regexp::Assemble;
my $ra = Regexp::Assemble->new;
while (<>)
{
$ra->add($_);
}
print $ra->as_string() . "\n";
# cat example.txt
base64_decode
convert_uudecode
file_get_contents
gzdecode
gzinflate
gzuncompress
hex2bin
proc_open
shell_exec
str_rot13
zlib_decode
# perl assemble.pl < example.txt
(?:(?:(?:base64|zlib)_|convert_uu)decode|gz(?:(?:inflat|decod)e|uncompress)|s(?:hell_exec|tr_rot13)|(?:proc_ope|hex2bi)n|file_get_contents)
It looks like it would also work for a source list of regexps, so I can use it to build the Group III regexp out of a somewhat more readable list if you think it's worth it.
Regexp::Assemble does generate regexps which are hard on the eye, but that's not a big problem. After all, people should never edit these regexps directly. We should just ensure full test coverage to guard against possible bugs.
As a last note, I would store the sources for regexps in a separate place and not among the regular .data files used by @pmf rules. This prevents adding to the cognitive load when examining the files (which ones are used in production and which one are mere sources?) We could keep them in the repo by rule ID, example: util/regexp-assemble/933113.data
I've done some work on the function names to detect.
I combined Chaim's function list (posted in #290) with a list scraped from the PHP functions page. That page also lists function names from various PHP extensions. To keep the size of the function list from exploding, I propose to include only the global functions from that page, and ignore the following lesser used functions: bbcode_
bcompiler_
cairo_
cubrid_
db2_
dbase_
dba_
dbplus_
dbx_
eio_
enchant_
event_
fam_
fann_
fbsql_
fdf_
geoip_
gmp_
gnupg_
grapheme_
gupnp_
ibase_
id3_
ifx_
ingres_
kadm5_
mailparse_
maxdb_
msession_
m_
ncurses_
newt_
oci
openal_
PDF_
pspell_
ps_
px_
radius_
rrd_
stats_
svn_
trader_
udm_
vpopmail_
xdiff_
. Which PHP extensions to ignore or include is open for discussion. Anything of this ignored list to re-add?
Chaim's list of functions has 886 entries. The combined function list has 2350 entries. I can't say what the performance impact of such a data list size is. Is this a feasible list size, @csanders-git ?
Of the combined list, there are 103 entries that match the English dictionary. I used
Webster's 2nd International for this; BSD systems have it as /usr/share/dict/web2
. These are a good first start for false-positive prone entries to put into group I (common benign words) or group III (common abused functions).
This is just the autogenerated stuff. I'll also check the combined list manually, before partitioning the functions into groups.
Any comments on the function list are appreciated.
functions-all.txt (combined function list) dictionary-overlap.txt (functions from the list above, plus the number of time they appear as a substring in a dictionary word)
I've continued work on the function lists and I can present the first version of the Master Function Group List™.
To recap, default users (paranoia level 1) only block on groups 3 and 4, so these groups should contain a robust 'core' of PHP functions, described by a limited number of low-FP strings. Group 3 will require an adjoining (
character (regexp rule), group 4 will match instantly (pmf rule). Group 2 is the rest (only for PL2+ users), group 1 we ignore.
I've arrived at the groups as follows:
pclose
is tossed if the dictionary contains upclose
Group sizes:
Miscellaneous musings:
(
character to possibly relieve a bit of FP. I'm up for it. It's a very big list even if we have removed a lot of cruft.file
(group 3). It's conceivable that in some natural text, the sequence file (
will appear: can u send me that file (blah.doc)
. This will likely lead to some low frequency FP... not totally uncommon for the CRS but still. Anyway, it's a shame to not match on this function, so I've left it for now.Comments are welcome.
I've generated the rules. My work can be reviewed in PR #347 .
Some changes from my previous comment:
Please review the variables, rules (I used the old PHP rule as a template but it's not gospel) and regexps. Spot anything in the regexp which might cause a bypass?
Any comments are welcome. I'll do some more testing. (Please merge only after review)
Now that the PR is merged, I've tested the CRS3 at paranoia level 1 on 412 known PHP malware samples.
The result is better than I expected!
10 samples scored 0 on PHP rules. After investigating manually, these samples were all true negatives (don't contain PHP code but were incorrectly in the test set).
15 samples scored only 5 on PHP rules. I went through these samples for bypasses. Most of them used variable functions, which will be detected once I finish #294. I've also got some more inspiration for more rules.
387 samples scored 10 or higher on PHP rules which is a robust detection.
Here is a histogram which shows the distribution of PHP rule scores. As you can see, it's easy for PHP code to hit many rules; most malware gets 20 or 25 points just from PHP rules only.
And a jittered scatterplot plot of PHP scores versus TOTAL CRS scores. It shows that a lot of other rules are being hit as well!
I'll close this issue and start work on bypasses and further rules soon. Thanks to everybody for the input!
Very cool. Can you share your malware samples? I would love to add them to my collection. Can trade payloads extracted from exploit-db.org. ;)
I think the 4-group method could be general pattern for groups strings. #327 for example. What do you think, @lifeforms?
@dune73 Samples here: 1, 2, 3. They are not necessarily representative of modern threats though; most of them are huge file managers that score really high, in production I see more shorter oneliners. This is a cool one. There's a lot of perl and ASP too so not all of them are PHP.
Yeah, maybe the 4-group is a nice way to divide other strings too. I'm definitely up for curating and splitting data files according to frequency and FP risk. But the high-FP/low-FP partition in the PHP function rules really depends on the PHP function call syntax. We probably won't be able to make such a nice split in #327. But I'll take a good look at it!
Thanks for the links!
So the attacker uploads these shells as files (?) and we try and stop the upload depending on REQUEST_BODY?
I agree, with the php function names, it was particularly neat.
@dune73 In my experience attackers usually use a small PHP one-liner as the actual RCE exploit payload, and then try to save a bigger webshell to disk for persistence and ease of use. The big webshell applications aren't really necessary (you could also just have the client execute any command desired) and I would be surprised if we will see a lot of inputs with such very high scores. Often they are sent in encoded form anyway. I'm more interested in detecting the initial one-liners. But it's still important to test on lots of webshells because some of them contain interesting obfuscation techniques which can be detected with new rules.
As for files uploaded with HTTP (multipart/form-data) IIRC, we can't directly inspect HTTP uploaded files from these rules without using ctl:forceRequestBodyVariable which probably has performance problems. (I have another rule to protect unrestricted uploads against PHP uploads,I'll create a separate issue for that)
So, I got this correctly.
I guess we would need to work on FILES_TMP_CONTENT, which depends on SecUploadKeepFiles and SecRequestBody. So yes, there is a performance issue. Besides, I think you are totally right, we need to catch the one-liner. Do you also have a list of these payloads? Or is that exploit-db stuff?
@dune73 I distill the essential parts of the attacks and create minimal test cases when working on the relevant rules. To publish bigger request dumps would require me to sanitize them, so I haven't done that. Once the dust settles on the test tool work with Fastly, it would be cool to create a shared repository of HTTP attack dumps. I don't think there's a good one yet. Could start one if there is interest! It would be pretty cool if we'd just have a site where people can really easily submit modsec audit log or access_log entries (optionally with some additional information like suspected CVE, application etc). Having a diverse set with many sources would really help us expand the CRS rules in new areas, and maybe in the future it could even get big enough for machine learning...
@foospidy is trying to build such a database at https://github.com/foospidy/payloads. A collaboration would be very beneficial. Problem I see with his stuff so far: It's mostly attacks, but not all of his payloads are attacks. So you get requests without an alert and you think it's a false negative, but then it's not (and spoils your statistics). Also: Have you looked into the @fuzzyHash operator? Trustwave commercial rules make use of it AFAIK and gotroot/atomicorp rules have a fairly extensive ssdeep DB included as well. We may want to do this as well sooner or later. The problem is assembling the DB of course. We could build one on popular collections of web shells for a start.
@dune73 Thanks for the @foospidy hint, I think I'll contact him some time. These data sets are not yet in standardized format to fire at a WAF yet though. I'm hopeful Chaim/Fastly's tool could become the standard for such a thing.
Fuzzyhash is pretty cool, and I would not be against feeding a lot of malware into it and adding it to the CRS. Tracking actual instances of malware (instead of attack classes) is at this point not something that the CRS does. I don't have competing interests so I would always be interested in adding as much as possible to the CRS ;) I'd love to brainstorm about what we'd like the CRS to be in 12 months from now. I have lots of other ideas brewing too. Maybe arrange an IRC meet soonish and try to include some of the major users too? We also really should have a chat with them before CRS3 release.
Say hello to @foospidy from me. I've been in touch with him before.
Yes, FuzzyHash is a bit a different philosophy than the CRS. But we could argue that the percentage means it has a generic blacklist touch to it. Have you looked at speed? Otherwise, if it's only file uploads, then maybe the speed is not that much of an issue.
@dune73 Let's discuss it in #363.
PHP function name detection is useful to detect PHP code injection. We have several issues open about this (#290, #291, #326), but I think they are best addressed at once.
Challenges with detecting PHP functions are false positives (due to overlap with English words), false negatives (need to be strict on dangerous functions), and performance (there are many PHP functions).
Grouping the PHP functions
I think the issue becomes most clear if we separate the PHP functions on two dimensions: the abuse potential, and the expected false positive rate. When graphing the space, you can think of something like this:
'Dangerous' high-abuse functions are often seen in actual code injection exploits. From my experience, they are centered around decoding (unpacking a compressed or obfuscated payload), file/URL access (for instance a remote payload or a local file) and execution (interpreting code or starting some other process).
However, any other PHP function may still be used in a PHP payload, so it's useful to detect these as well. Many attackers encode their payload however, if not to evade WAFs then to comfortably inject code with newlines.
We must pay close attention to false positives. Many PHP function names are used in English words.
Which actions to take for each group
Edited 2016-05-30:
sleep
,time
,abs
: Create a regexp-based rule to look for these words plus a trailing parenthesis. At PL3, block as critical._strpos
,setlocale
: Add to data file php-function-names.data. Do @pmf check on this file. At paranoia level 2, block as critical._ These functions are arguably used in real-life PHP code, so we should detect them. There are a LOT of functions in this category, so for performance reasons, we should do @pmf on such a big file just once.system
,exec
,eval
: Create a regexp-based rule to look for these words plus a trailing parenthesis. At PL1, block as critical._ These few functions are absolutely vital to block, and at the same time very FP prone. We can't simply block on a data file with @pmf, because we need regexp magic to ensure that we see real PHP function syntax, e.g.eval(
and notmedieval (500-1500)
. There are not too many of these functions, so a regexp rule would be neither hard to maintain nor too slow in my opinion.gzinflate
,base64_decode
: Add a data file. At PL1, block as critical._ Since these strings are less common, we don't need a regexp to confirm function, and we can just block as we find them. Why not insert them into the regexp of Group III? Well, the regexp might be evaded by a clever ruse, so we should keep Group III as small as possible. If you seebase64_decode
it's bad news, don't look for the parenthesis and assign a very high score.Example payloads
This approach has the advantage that there will be a diversity of rules, so a common PHP payload will trigger multiple rules and rack up an interesting score. For example, here are some random examples from audit logs today, to demonstrate in which way these functions are injected and chained by attackers. Especially
eval
is popular:eval(chr(112).chr(104).chr(112).chr(105).chr(110).chr(102).chr(111)...
eval(gzinflate(str_rot13(base64_decode('vUl6QttVE...
eval(base64_decode('JGNoZWNrID...
Rules
A regexp for Group III could look like the following (simplified example):
Todos