andresriancho / w3af

w3af: web application attack and audit framework, the open source web vulnerability scanner.
http://w3af.org/
4.53k stars 1.21k forks source link

Move grep plugins to a different process to improve performance #28

Open andresriancho opened 11 years ago

andresriancho commented 11 years ago

Introduction

In order to achieve this EPIC task, many things need to be analyzed.

History

We already tried to do this, and failed: https://github.com/andresriancho/w3af/commits/multiprocessing

Measure performance

I need a good way to measure performance of this improvement, so before even starting I'll need to define how to measure performance improvements.

A good idea would be to:

Compare before and after.

Output manager refactoring

Grep plugins call the output manager to print information about newly identified vulnerabilities. Calling the output manager from another process is an already solved problem, see how this was done in the multiprocessing document parser.

The same ideas could be applied to communication with cf and kb from the main thread.

Grep worker refactoring

We can re-use a lot of the things we learnt from he multiprocessing document parser.

Serialization is completely transparent when using pebble. If all attributes from request and response are serializable then we wouldn't have any issues. The only worry I have is the re-work of, for example, having to parse the same HTTP response in each process because I had to remove that attribute from the HTTP response instance before sending it to the wire.

The main thread would create N grep consumer processes, each with its own queue. Each process would have a subset of the enabled grep plugins. Each enabled grep plugin would have only one instance, living in one of the grep consumer processes.

When the main thread receives a request / response, it has to send it to all grep consumer processes.

Having multiple process for the grep consumer means that the multiprocessing document parser cache will have N instances and be 1/N times effective. A lot of rework would be done to parse the same response multiple times. This is something to solve.

KnowledgeBase and Configuration refactoring

Grep plugins query the KnowledgeBase and cf objects, how are we going to "proxy" (?) those calls to the parent process / main thread?

andresriancho commented 10 years ago

Or maybe work with something like https://code.jd.com/tommybao/msgpack-rpc-python which would allow me to use the network in the future? Also, it includes some nice integration with futures.

andresriancho commented 10 years ago

http://zerorpc.dotcloud.com/ is a more serious implementation of RPC.

The mailing list show some ugly stuff:

In addition, ZeroRPC itself is not thread safe, meaning that you cannot call any function of the same zerorpc context from different thread. You have to use one zerorpc context per thread. Else, shit will hit the fan hard.

https://groups.google.com/forum/#!searchin/zerorpc/thread/zerorpc/BZpzdpll6gk/HvJn19jndqoJ

andresriancho commented 9 years ago

https://github.com/dotcloud/zerorpc-python last update was almost 6 months ago, maybe they are not improving/supporting it anymore?