Closed MirkoDziadzka closed 4 years ago
Hi @MirkoDziadzka,
many thanks for your detailed feedback!
After a quick review, I can confirm your remarks, and I'll try to fix them soon.
Anyway, please let me show you some tip, how can you debug the rules. The package contains (and the documentation mentions it), there are two "helper" script, which show its own results. If you run those tools with arguments (eg. you create a file with only one line, which contains the affected rule), you can see the parsed symbols (test_lexer.py
), or the flow of building of AST (test_parser.py
), just add a 2nd argument with name debug
, eg:
./examples/test_lexer.py test.conf debug
In case of these rules:
SecRule ARGS "@contains :" "id:42,phase:2,block,t:none"
SecRule ARGS "@rx :\/" "id:42,phase:2,block,t:none"
SecRule ARGS "@pm |" "id:42,phase:2,block,t:none"
the problem is that the special symbols in operator argument evaluated as its meaning (eg. COLON or PIPE), not a regular character. I think this is a valid request to fix it. :).
In this case:
SecRule ARGS:'/^(foo|bar)$/' "@rx 42" "id:42,phase:2,block,t:none"
the |
is a separator, similar than above - it's a PIPE, not a regular character, and the next expected token should be an another variable. Also a valid remark.
See this:
SecRule ARGS,REQUEST_HEADERS "@rx 42" "id:42,phase:2,block,t:none"
As I know, the separator character in SecLang
or ModSecurity Language
is the |
, this is the first case where I met with this form. I changed the ,
by a |
, and the script parsed the rule. Anyway, I'll try to modify the tool for this expression. After a quick view, I think it's easy to extend the parser that it can capable to parse successfully this form.
Finally, this rule
SecRule ARGS foobar "id:42,phase:2,block,t:none"
is a "classic" form :). This parser follows the CRS syntax, therefore the operator and its arguments need to quoted by a double quote mark. I changed the syntax with this:
SecRule ARGS "foobar" "id:42,phase:2,block,t:none"
and the parse also could be produced a YAML result. I think this will be the last on the list of fixes :).
Sorry for the inconvenience, I'll back soon with fixes.
Hi @airween
Thanks for the feedback and the hints for debugging. I know that these scripts where mainly written from CRS, but I think the would be useful for generic modsec rules. I'm currently analyse a 15'000 rules database and it really helps to see the data in JSON ;-)
Hi @MirkoDziadzka,
sorry for the late reply.
In this branch, you can check the fixes for items of your list: 1, 3, 4, 6, 7. The 2nd and the 5th issues are left, that's not so trivials :).
Please check that if you can, and let me know the results - thank you.
The 2nd example also fixed:
SecRule ARGS "@rx ^\\\\" "id:42,phase:2,block,t:none"
Some notes for the fixes: please note, that when you write back the parsed data to SecLang format (if you want), there are two cases where the result will mismatch from the origin content. The parsed structure doesn't contains the separator of SecRule
arguments, it assumes that you use |
. If you write back, then the |
will be used. So, from this:
SecRule ARGS,REQUEST_HEADERS "@rx 42" "id:42,phase:2,block,t:none"
will write back like this:
SecRule ARGS|REQUEST_HEADERS "@rx 42" "id:42,phase:2,block,t:none"
The another case it the operatorless rule:
SecRule ARGS foobar "id:42,phase:2,block,t:none"
Here the parsed structure also doesn't contain any "metadata", eg. the quoted state of source, it assumes that the operator and its argument were quoted with a "
. When you write back, then the output will contains the quoted argument (but as you didn't used explicit @rx
operator, you will not see that). In this case, you will get:
SecRule ARGS "foobar" "id:42,phase:2,block,t:none"
Hey @airween
Thanks for the fix, looks good so far.
However, I found a new problem.
In summary: The following is working fine
SecRule ARGS "@rx ^\\\\" "id:42,phase:2,block,t:none"
SecRule ARGS "@contains :" "id:42,phase:2,block,t:none"
SecRule ARGS "@pm |" "id:42,phase:2,block,t:none"
SecRule ARGS "@rx :\/" "id:42,phase:2,block,t:none"
SecRule ARGS,REQUEST_HEADERS "@rx 42" "id:42,phase:2,block,t:none"
SecRule ARGS foobar "id:42,phase:2,block,t:none"
SecRule REQUEST_URI|REQUEST_HEADERS "@rx 42" "id:42,phase:2,block,t:none"
The following is not. The first of the 2 is new
SecRule REQUEST_URI,REQUEST_HEADERS "@rx 42" "id:42,phase:2,block,t:none"
SecRule ARGS:'/^(foo|bar)$/' "@rx 42" "id:42,phase:2,block,t:none"
I currently do not the the write back functionality, and I think normalisation is a good thing.
Hi @MirkoDziadzka,
thanks again, the 47f939 fixes the new COMMA related bug:
SecRule REQUEST_URI,REQUEST_HEADERS "@rx 42" "id:42,phase:2,block,t:none"
The other will fixed soon. Check this if you can - thanks.
Hi @MirkoDziadzka,
now the issue_09 branch contains all fixes of your mentioned examples.
Please update your local copy, and run the tests again - let me know the results :).
@airween, I added some comments to https://github.com/digitalwave/msc_pyparser/commit/a66a4525c18e5cd91d22f671b1a3f1716f77a7b9
With the current issue_09 and the following change, I can parse all of my ~ 15'000 rules.
diff --git a/msc_pyparser.py b/msc_pyparser.py
index 32c5404..e2411d0 100644
--- a/msc_pyparser.py
+++ b/msc_pyparser.py
@@ -140,7 +140,7 @@ class MSCLexer(object):
return t
def t_secrulesecvararg_SECRULE_VARIABLE_ARG(self, t):
- r'([^ \s\t\n\|\'|]{1,})|(\'.*\')'
+ r"([^\s|']+)|('(\.|[^'\\])*')"
t.lexer.pop_state()
return t
Hi @MirkoDziadzka,
I merged all commits into the master branch.
Referring to your note, that everything works as well, I close this issue.
Many thanks for your report and help!
Just one note for testing: you wrote that you have 15000 parsed rule. If you want to be sure that your result is correct, please convert back your json/yaml structure to SecLang format, and make a diff. Here is how I use that:
mkdir testexp/
mkdir testimp/
./crs_read.py /home/airween/src/owasp-modsecurity-crs/rules testexp
./crs_write.py testexp testimp
for f in `ls -1 /home/airween/src/owasp-modsecurity-crs/rules/*.conf`; do b=`basename ${f}`; echo "diff ${f} testimp/${b}"; diff ${f} testimp/${b}; done
As I wrote above, the crs_write.py follow the strict format of CRS, I assume you will have some difference between the original source and generated output - but you can compare them.
Hi, thanks for the fix.
2 observations
$ time ./crs_write.py yyy zzz
Parsing CRS structure: yyy/Combined.yaml
Exception catched - (<class 'KeyError'>, KeyError('oplineno'), <traceback object at 0x10375b230>)
real 0m20.290s
user 0m19.916s
sys 0m0.261s
with full stack trace
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pdb.py", line 1699, in main
pdb._runscript(mainpyfile)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pdb.py", line 1568, in _runscript
self.run(statement)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/bdb.py", line 585, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/Users/mirko/Trustwave/crs_write.py", line 3, in <module>
import sys
File "/Users/mirko/.local/share/virtualenvs/mirko-jYbmoHav/lib/python3.7/site-packages/msc_pyparser-0.2.3-py3.7.egg/msc_pyparser.py", line 721, in generate
self.make_secrule(i)
File "/Users/mirko/.local/share/virtualenvs/mirko-jYbmoHav/lib/python3.7/site-packages/msc_pyparser-0.2.3-py3.7.egg/msc_pyparser.py", line 699, in make_secrule
if r['lineno'] < r['oplineno']:
KeyError: 'oplineno'
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /Users/mirko/.local/share/virtualenvs/mirko-jYbmoHav/lib/python3.7/site-packages/msc_pyparser-0.2.3-py3.7.egg/msc_pyparser.py(699)make_secrule()
-> if r['lineno'] < r['oplineno']:
(Pdb) p r
{'actions': [{'act_arg': 'none', 'act_name': 't', 'act_quote': 'no_quote', 'lineno': 28305}], 'chained': False, 'lineno': 28305, 'operator': '', 'operator_argument': '.*', 'type': 'SecRule', 'variables': ['ARGS:c']}
I will extract a simplified example and will open another issue
Hi,
just a quick note for the parsing speed,
* parsing feels slower as before, but I do not have numbers to prove this. I now see 150 seconds for 15'000 rules, so 100 rules/second.
I have a bit worse value:
$ time ./crs_read.py /home/airween/src/owasp-modsecurity-crs/rules/ testexp
...
real 0m8.658s
user 0m8.612s
sys 0m0.012s
CRS contains about 600 SecRule
objects (plus chained rules, they are about 170), so the total value is 770. It's about 90 rules/second, but my test environment runs on a very old machine (4x X3440 CPU with SATA2 disks). May be we can optimize the code, but I'm not worried about the speed, I think this isn't an operation which performed too often. But you're right, the 2.5 minutes is a bit painful :).
Note, that actually I'm working on an another project: I'ld like to implement this parser in C, I'm curious how fast will be that (oh, sure, Python isn't a performance champion :))
Ok, I have a faster machine here. When I talk about 15'000 rules, I mean 15'000 chains. At the end, this are more than 30'000 SecRule commands.
Btw, It is not the parser which is this slow, it is yaml.
When I change crs_read from yaml to json, I get 20 seconds instead of 200.
Therefore my impression that it got slower, I worked with JSON before and switched to yaml to test the crs_write part.
how to reproduce
expected result
actual result
remarks
On a bigger modsec ruleset, I see around 1 parsing errors per 1000 rules. Here are simplified examples:
And 2 errors which where new to me (that this is allowed in modsec)
Detail