SpiderLabs / owasp-modsecurity-crs

OWASP ModSecurity Core Rule Set (CRS) Project (Official Repository)
https://modsecurity.org/crs
Apache License 2.0
2.45k stars 728 forks source link

PHP function name detection #338

Closed lifeforms closed 8 years ago

lifeforms commented 8 years ago

PHP function name detection is useful to detect PHP code injection. We have several issues open about this (#290, #291, #326), but I think they are best addressed at once.

Challenges with detecting PHP functions are false positives (due to overlap with English words), false negatives (need to be strict on dangerous functions), and performance (there are many PHP functions).

Grouping the PHP functions

I think the issue becomes most clear if we separate the PHP functions on two dimensions: the abuse potential, and the expected false positive rate. When graphing the space, you can think of something like this:

PHP functions

'Dangerous' high-abuse functions are often seen in actual code injection exploits. From my experience, they are centered around decoding (unpacking a compressed or obfuscated payload), file/URL access (for instance a remote payload or a local file) and execution (interpreting code or starting some other process).

However, any other PHP function may still be used in a PHP payload, so it's useful to detect these as well. Many attackers encode their payload however, if not to evade WAFs then to comfortably inject code with newlines.

We must pay close attention to false positives. Many PHP function names are used in English words.

Which actions to take for each group

Edited 2016-05-30:

This approach has the advantage that there will be a diversity of rules, so a common PHP payload will trigger multiple rules and rack up an interesting score. For example, here are some random examples from audit logs today, to demonstrate in which way these functions are injected and chained by attackers. Especially eval is popular:

A regexp for Group III could look like the following (simplified example):

# We want to match on:
# system(
# system (
# system\t(
# system /*comment*/ (
# system /*multiline \n comment*/ (
# system //comment \n (
# system #comment \n (
#
# We don't want to match on:
# the system is down
# ecosystem(
# systematic(
#
# ModSecurity does a multi-line regexp as of 2.9, so multiline comments are no problem.
SecRule ARGS "\b(eval|system)(\s|/\*.*\*/|//.*|#.*)*\(" \
    "id:123456,phase:1,t:none,t:urlDecodeUni,t:lowercase,deny,status:406,tag:'TEST/GROUP3'"

Todos

dune73 commented 8 years ago

@lifeforms, this is one hell of a post! I took a few days to let it sink in. Sorry for the long delay.

But, I return with a clear and simple resolution: KISS

We group the function names into multiple data files according to group / paranoia level.

Group I: ignore as your suggestion (-> no data file) Group II: separate data file, rule on paranoia level 3 (or 2?) Group III: separate data file, rule on paranoia level 2 (or 1?), special chained rule to suppress FPs Group IV: separate data file, rule on paranoia level 1

The special chained rule could work as follows (tested): SecRule ARGS|ARGS_NAMES|REQUEST_HEADERS "@pmFromFile php-function-names-group-3.data" \ "phase:2,capture,chain,id:10002,deny,msg:'Blacklist hit %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'" SecRule MATCHED_VARS "@contains ("

This allows us to build 3 rules in a very similar fashion (-> strict siblings). We can go with the fast @pmf operator and avoid complex regexes. The chained rule is straight forward. If the @contains operator does not weed out enough FPs, then we can fall back on a variant of the rule you are proposing. I just think that it is worthwhile to test with @pmf first, as it is more readable and all function names are listed next to each other in 3 data files.

Personally, I think everything should run with action:block and with anomaly score 5.

csanders-git commented 8 years ago

I'm adding some data to this discussion. While i think we should also concern ourselves with false positives. I calculated the levenschtien distance versus the normal dictionary. Based on these i highlighted functions that had present words in the dict with a LD score of 0,1 or 2. Words with a LD of zero represent dictionary words - with a score of 1 present a HIGH chance of false positives. Words with a two probably a mild chance. Words without any LD score represent a low chance of false positive.

outputResults.csv.txt

First col is function name, second is words from dict with LD score of 0, etc.

NearestWords.txt phpOut.txt

csanders-git commented 8 years ago

Attached is just the values with an LD of 0 - will very likely cause a FP as they are english words. I will continue to split these into our two groups forthcoming.

highFP.txt

csanders-git commented 8 years ago

I sorted my list of HighFP elements from todays table - In general there MAY be more when we look at higher LD values (esp with the shorter words if we use @pm highFP.csv.txt

lifeforms commented 8 years ago

@dune73, sorry for taking so much time to respond.

I appreciate that you are trying to keep the rules as simple as possible, although I'm not fully agreeing with the outcome yet, maybe we can find a compromise! :)

If you think my Group III regexp proposal would be complex too maintain, I think the added tests will make it pretty easy to see how it works. Here are some example tests used to develop my group III regexp:

  - test: [{url: '/', code: 200}]
  - test: [{url: '/?foo=system%28', code: 403}]
  - test: [{url: '/?foo=System%28', code: 403}]
  - test: [{url: '/?foo=system%0D%28', code: 403}]
  - test: [{url: '/?foo=system%0A%28', code: 403}]
  - test: [{url: '/?foo=system%0D%0A%28', code: 403}]
  - test: [{url: '/?foo=system%20%28', code: 403}]
  - test: [{url: '/?foo=system%20%20%28', code: 403}]
  - test: [{url: '/?foo=%40system%28', code: 403}]                       # @system(
  - test: [{url: '/?foo=system%09%28', code: 403}]                       # system\t(
  - test: [{url: '/?foo=system%2F%2Fcomment%0A%20%28', code: 403}]       # system//comment\n (
  - test: [{url: '/?foo=system%20%23%23comment%0A%20%28', code: 403}]    # system #comment\n (
  - test: [{url: '/?foo=system%20%23%23%20%28', code: 403}]              # system #\n (
  - test: [{url: '/?foo=system%2F%2Acomment%2A%2F%28', code: 403}]       # system/*comment*/(
  - test: [{url: '/?foo=system%20%2F%2Acomment%2A%2F%20%28', code: 403}] # system /*comment*/ (
  - test: [{url: '/?foo=system%20%09%2F%2A%2A%2F%09%20%28', code: 403}]  # system \t/**/\t (
  - test: [{url: '/?foo=system%20%09%2F%2Amulti%0D%0Aline%2A%2F%09%20%28', code: 403}] # system \t/*multi\r\nline*/\t (
  # prevent false positives:
  - test: [{url: '/?foo=eval', code: 200}]
  - test: [{url: '/?foo=cheval', code: 200}]
  - test: [{url: '/?foo=the%20system', code: 200}]
  - test: [{url: '/?foo=ecosystem%28', code: 200}]
  - test: [{url: '/?foo=systems%28', code: 200}]
  - test: [{url: '/?foo=system%20something%28', code: 200}]
dune73 commented 8 years ago

@csanders-git These statistics are very interesting. I am a computer linguistics n00b, but I want to learn this. The data you provide look like the very thing to define the groups.

@lifeforms Thank you for taking the time to explain things and sorting out the differences. Yes, I am sure we can find a compromise. We are very close, actually.

So we consider group I, group II and group IV as settled. Hope @csanders-git chimes in.

I agree to your arguments to run this in PL1. Makes sense and your experience underlines, that there are going to be few FPs. Few FPs if we go with the regex and not with the "@pmf + chained @contains (". I am glad you shed some light on this. I feared that the @contains was no equivalent of the regex. But please give me a moment to come up with a pmf+chain variant that equals the regex variant. I really want to try to be able to work with three files in the same format. If it does not work, then at least we tried.

dune73 commented 8 years ago

OK, I tried this.

If you do @pmf and then capture TX.0, you can save the file entry that matched a temp variable together with "(" as the new pattern:

SecRule ARGS "@pmf /tmp/terms-group-3"  "id:1006,phase:2,chain,pass,log,capture,setvar:TX.pattern=%{TX.0}("

This pattern can then be used as the pattern in the chained rule and that's almost a win. BUT unfortunately, evasion is easy, as TX.0 will only be filled once and filled with the first occurrence of a pattern in the group-file.

Consider:

foo=bar+define+and+eval(

TX.0 will now be "define" and "eval(" will fly under the radar. So as far as I can see this is impossible with ModSec2 and we need to wait for ModSec3. Hopefully.

Accessing the same parameter in the chained rule as in the initial rule is a similar problem. ModSec tends to overwrite MATCHED_VAR and even MATCHED_VARS and especially in connection with @pmf.

This brings us back to your regex variant, @lifeforms:

Can we work with a groupfile and inside the group file we have a comment with a recipe which can be used to build the regex on the base of the groupfile? The rules file with the regex would then point to the groupfile with a comment which explains how to build the regex and that the master copy of the functions resides in the group file, even if the group file is not referenced directly from the rules file. (Is this explanation understandable at all?)

lifeforms commented 8 years ago

@dune73 Yeah, I came to the same conclusion with the data file vs. the regexp. It can't be done with our current engine. Okay, seems we have consensus about the approach!

I like your idea about storing the source of regexps in original format. This particular regexp is probably not so complex and has few terms, but it might be interesting to store a source list that can be used to rebuild the regexp. To build regexps, the CRS has used the Regexp::Assemble perl module described in Optimizing regular expressions. The perl module works and is still actively maintained.

# sudo cpan install Regexp::Assemble

# cat assemble.pl 
use strict;
use Regexp::Assemble;

my $ra = Regexp::Assemble->new;
while (<>)
{
    $ra->add($_);
}
print $ra->as_string() . "\n"; 

# cat example.txt 
base64_decode
convert_uudecode
file_get_contents
gzdecode
gzinflate
gzuncompress
hex2bin
proc_open
shell_exec
str_rot13
zlib_decode

# perl assemble.pl < example.txt
(?:(?:(?:base64|zlib)_|convert_uu)decode|gz(?:(?:inflat|decod)e|uncompress)|s(?:hell_exec|tr_rot13)|(?:proc_ope|hex2bi)n|file_get_contents)

It looks like it would also work for a source list of regexps, so I can use it to build the Group III regexp out of a somewhat more readable list if you think it's worth it.

Regexp::Assemble does generate regexps which are hard on the eye, but that's not a big problem. After all, people should never edit these regexps directly. We should just ensure full test coverage to guard against possible bugs.

As a last note, I would store the sources for regexps in a separate place and not among the regular .data files used by @pmf rules. This prevents adding to the cognitive load when examining the files (which ones are used in production and which one are mere sources?) We could keep them in the repo by rule ID, example: util/regexp-assemble/933113.data

lifeforms commented 8 years ago

I've done some work on the function names to detect.

I combined Chaim's function list (posted in #290) with a list scraped from the PHP functions page. That page also lists function names from various PHP extensions. To keep the size of the function list from exploding, I propose to include only the global functions from that page, and ignore the following lesser used functions: bbcode_ bcompiler_ cairo_ cubrid_ db2_ dbase_ dba_ dbplus_ dbx_ eio_ enchant_ event_ fam_ fann_ fbsql_ fdf_ geoip_ gmp_ gnupg_ grapheme_ gupnp_ ibase_ id3_ ifx_ ingres_ kadm5_ mailparse_ maxdb_ msession_ m_ ncurses_ newt_ oci openal_ PDF_ pspell_ ps_ px_ radius_ rrd_ stats_ svn_ trader_ udm_ vpopmail_ xdiff_. Which PHP extensions to ignore or include is open for discussion. Anything of this ignored list to re-add?

Chaim's list of functions has 886 entries. The combined function list has 2350 entries. I can't say what the performance impact of such a data list size is. Is this a feasible list size, @csanders-git ?

Of the combined list, there are 103 entries that match the English dictionary. I used Webster's 2nd International for this; BSD systems have it as /usr/share/dict/web2. These are a good first start for false-positive prone entries to put into group I (common benign words) or group III (common abused functions).

This is just the autogenerated stuff. I'll also check the combined list manually, before partitioning the functions into groups.

Any comments on the function list are appreciated.

functions-all.txt (combined function list) dictionary-overlap.txt (functions from the list above, plus the number of time they appear as a substring in a dictionary word)

lifeforms commented 8 years ago

I've continued work on the function lists and I can present the first version of the Master Function Group List™.

To recap, default users (paranoia level 1) only block on groups 3 and 4, so these groups should contain a robust 'core' of PHP functions, described by a limited number of low-FP strings. Group 3 will require an adjoining ( character (regexp rule), group 4 will match instantly (pmf rule). Group 2 is the rest (only for PL2+ users), group 1 we ignore.

I've arrived at the groups as follows:

Group sizes:

Miscellaneous musings:

Comments are welcome.

groups.csv.txt

lifeforms commented 8 years ago

I've generated the rules. My work can be reviewed in PR #347 .

Some changes from my previous comment:

Please review the variables, rules (I used the old PHP rule as a template but it's not gospel) and regexps. Spot anything in the regexp which might cause a bypass?

Any comments are welcome. I'll do some more testing. (Please merge only after review)

groups.csv.txt

lifeforms commented 8 years ago

Now that the PR is merged, I've tested the CRS3 at paranoia level 1 on 412 known PHP malware samples.

The result is better than I expected!

10 samples scored 0 on PHP rules. After investigating manually, these samples were all true negatives (don't contain PHP code but were incorrectly in the test set).

15 samples scored only 5 on PHP rules. I went through these samples for bypasses. Most of them used variable functions, which will be detected once I finish #294. I've also got some more inspiration for more rules.

387 samples scored 10 or higher on PHP rules which is a robust detection.

Here is a histogram which shows the distribution of PHP rule scores. As you can see, it's easy for PHP code to hit many rules; most malware gets 20 or 25 points just from PHP rules only.

hist

And a jittered scatterplot plot of PHP scores versus TOTAL CRS scores. It shows that a lot of other rules are being hit as well!

scatter

I'll close this issue and start work on bypasses and further rules soon. Thanks to everybody for the input!

dune73 commented 8 years ago

Very cool. Can you share your malware samples? I would love to add them to my collection. Can trade payloads extracted from exploit-db.org. ;)

I think the 4-group method could be general pattern for groups strings. #327 for example. What do you think, @lifeforms?

lifeforms commented 8 years ago

@dune73 Samples here: 1, 2, 3. They are not necessarily representative of modern threats though; most of them are huge file managers that score really high, in production I see more shorter oneliners. This is a cool one. There's a lot of perl and ASP too so not all of them are PHP.

Yeah, maybe the 4-group is a nice way to divide other strings too. I'm definitely up for curating and splitting data files according to frequency and FP risk. But the high-FP/low-FP partition in the PHP function rules really depends on the PHP function call syntax. We probably won't be able to make such a nice split in #327. But I'll take a good look at it!

dune73 commented 8 years ago

Thanks for the links!

So the attacker uploads these shells as files (?) and we try and stop the upload depending on REQUEST_BODY?

I agree, with the php function names, it was particularly neat.

lifeforms commented 8 years ago

@dune73 In my experience attackers usually use a small PHP one-liner as the actual RCE exploit payload, and then try to save a bigger webshell to disk for persistence and ease of use. The big webshell applications aren't really necessary (you could also just have the client execute any command desired) and I would be surprised if we will see a lot of inputs with such very high scores. Often they are sent in encoded form anyway. I'm more interested in detecting the initial one-liners. But it's still important to test on lots of webshells because some of them contain interesting obfuscation techniques which can be detected with new rules.

As for files uploaded with HTTP (multipart/form-data) IIRC, we can't directly inspect HTTP uploaded files from these rules without using ctl:forceRequestBodyVariable which probably has performance problems. (I have another rule to protect unrestricted uploads against PHP uploads,I'll create a separate issue for that)

dune73 commented 8 years ago

So, I got this correctly.

I guess we would need to work on FILES_TMP_CONTENT, which depends on SecUploadKeepFiles and SecRequestBody. So yes, there is a performance issue. Besides, I think you are totally right, we need to catch the one-liner. Do you also have a list of these payloads? Or is that exploit-db stuff?

lifeforms commented 8 years ago

@dune73 I distill the essential parts of the attacks and create minimal test cases when working on the relevant rules. To publish bigger request dumps would require me to sanitize them, so I haven't done that. Once the dust settles on the test tool work with Fastly, it would be cool to create a shared repository of HTTP attack dumps. I don't think there's a good one yet. Could start one if there is interest! It would be pretty cool if we'd just have a site where people can really easily submit modsec audit log or access_log entries (optionally with some additional information like suspected CVE, application etc). Having a diverse set with many sources would really help us expand the CRS rules in new areas, and maybe in the future it could even get big enough for machine learning...

dune73 commented 8 years ago

@foospidy is trying to build such a database at https://github.com/foospidy/payloads. A collaboration would be very beneficial. Problem I see with his stuff so far: It's mostly attacks, but not all of his payloads are attacks. So you get requests without an alert and you think it's a false negative, but then it's not (and spoils your statistics). Also: Have you looked into the @fuzzyHash operator? Trustwave commercial rules make use of it AFAIK and gotroot/atomicorp rules have a fairly extensive ssdeep DB included as well. We may want to do this as well sooner or later. The problem is assembling the DB of course. We could build one on popular collections of web shells for a start.

lifeforms commented 8 years ago

@dune73 Thanks for the @foospidy hint, I think I'll contact him some time. These data sets are not yet in standardized format to fire at a WAF yet though. I'm hopeful Chaim/Fastly's tool could become the standard for such a thing.

Fuzzyhash is pretty cool, and I would not be against feeding a lot of malware into it and adding it to the CRS. Tracking actual instances of malware (instead of attack classes) is at this point not something that the CRS does. I don't have competing interests so I would always be interested in adding as much as possible to the CRS ;) I'd love to brainstorm about what we'd like the CRS to be in 12 months from now. I have lots of other ideas brewing too. Maybe arrange an IRC meet soonish and try to include some of the major users too? We also really should have a chat with them before CRS3 release.

dune73 commented 8 years ago

Say hello to @foospidy from me. I've been in touch with him before.

Yes, FuzzyHash is a bit a different philosophy than the CRS. But we could argue that the percentage means it has a generic blacklist touch to it. Have you looked at speed? Otherwise, if it's only file uploads, then maybe the speed is not that much of an issue.

lifeforms commented 8 years ago

@dune73 Let's discuss it in #363.