dune73 / test-4

2 stars 4 forks source link

Simplify parsing of logs #85

Closed studersi closed 6 years ago

studersi commented 6 years ago

The file .apache-modsec.alias contains many convenient aliases for parsing log files. However, the aliases are very complicated to understand and must be rather difficult to maintain due to their complexity.

It seems to me that most of the complexity stems from the fact that they have to isolate a single element from the log without taking the position of the element in the logged line into account. If the position were to be taken into account and the line separated into its individual elements as an intermediate step, the code would be greatly simplified and easier to maintain.

Here is an example of what it would look like after it is simplified. The result can then easily be further processed. (copy to file and execute)

#!/bin/bash
# from tutorial-8-example-access.log
x='127.0.0.1 - - [2016-11-03 22:54:45.837804] "GET /drupal/sites/default/files/css/css_XaQw0q7OIpLOf9_Qapv6iSC8OU98v9mwOV5QZyp1CLo.css?0 HTTP/1.1" 304 - "http://localhost/drupal/index.php/search/node?keys=union+select+*+from+users" "curl (local test)" localhost 127.0.0.1 80 - - + "-" WBuyJX8AAQEAAEdWTgIAAACI - - 451 166 -% 2096 1459 40 207 0 0'
echo $x
echo
echo $x | \
perl -pe 's/([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ (\[.*\])\ (\".*\")\ (-|[0-9]{3})\ (-|[0-9]{3})\ (\".*\")\ (\".*\")\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ (\".*\")\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)\ ([^\ ]*)/\
1=$1\
2=$2\
3=$3\
4=$4\
5=$5\
6=$6\
7=$7\
8=$8\
9=$9\
10=$10\
11=$11\
12=$12\
13=$13\
14=$14\
15=$15\
16=$16\
17=$17\
18=$18\
19=$19\
20=$20\
21=$21\
22=$22\
23=$23\
24=$24\
25=$25\
26=$26\
27=$27\
28=$28\
/'

The result looks as follows:

127.0.0.1 - - [2016-11-03 22:54:45.837804] "GET /drupal/sites/default/files/css/css_XaQw0q7OIpLOf9_Qapv6iSC8OU98v9mwOV5QZyp1CLo.css?0 HTTP/1.1" 304 - "http://localhost/drupal/index.php/search/node?keys=union+select+*+from+users" "curl (local test)" localhost 127.0.0.1 80 - - + "-" WBuyJX8AAQEAAEdWTgIAAACI - - 451 166 -% 2096 1459 40 207 0 0

1=127.0.0.1
2=-
3=-
4=[2016-11-03 22:54:45.837804]
5="GET /drupal/sites/default/files/css/css_XaQw0q7OIpLOf9_Qapv6iSC8OU98v9mwOV5QZyp1CLo.css?0 HTTP/1.1"
6=304
7=-
8="http://localhost/drupal/index.php/search/node?keys=union+select+*+from+users"
9="curl (local test)"
10=localhost
11=127.0.0.1
12=80
13=-
14=-
15=+
16="-"
17=WBuyJX8AAQEAAEdWTgIAAACI
18=-
19=-
20=451
21=166
22=-%
23=2096
24=1459
25=40
26=207
27=0
28=0

This of course is just an example and has only been tested with this line log entry. But it would be relatively easy to adapt this for other log formats.

dune73 commented 6 years ago

How about this script: https://github.com/Apache-Labor/labor/blob/master/bin/parse-apache-logs.rb