andresriancho / w3af

w3af: web application attack and audit framework, the open source web vulnerability scanner.
http://w3af.org/
4.57k stars 1.22k forks source link

Migrate dbms to real ORM #1274

Open andresriancho opened 10 years ago

andresriancho commented 10 years ago

https://github.com/andresriancho/w3af/blob/feature/module/w3af/core/data/db/dbms.py and https://github.com/andresriancho/w3af/blob/feature/module/w3af/core/data/db/history.py are a mess. They are basically re-implementing something that others have done much better: An ORM.

The dbms main feature is to handle queries which come from different threads in an ordered manner. It seems that SQLAlchemy can do something very similar too (at least it can wrap around sqlite and provide access to the file from different threads - don't know about the order thing).

I believe it would be a great improvement to move to SQLAlchemy and remove all our custom/buggy ORM code.

Research needs to be done to analyze what will happen with:

http://stackoverflow.com/questions/15973481/sqlalchemy-pool-size-and-sqlite http://docs.sqlalchemy.org/en/rel_0_9/core/pooling.html

I've been mentioning SQLAlchemy but I could easily do this with Django's ORM, which I'm much more used to, is well tested, and Django can easily be installed using pip. Also, if in the future I'm planning to expose a REST API / web UI, Django will be used there and I'll already have some nice models to start with.

One more thing that would be nice to include in this refactoring is the storage of the "traces" (see history.py). HTTP requests and responses are now stored in multiple files, one for each pair. It would be better to store those in a DB too, this would potentially speed up write/reads and allow me to perform searches over the response body.

The knowledge base would be a nice thing to migrate to an ORM, and since it is well abstracted I could start with it. Looking at the code in knowledge_base.py you'll find SELECT, DELETE and other SQL statements, yuk!

Some options for storing HTTP requests and responses are:

Another option would be to migrate all to a noSQL database and not use SQLAlchemy?

andresriancho commented 10 years ago

Some of the issues we could fix by using a real ORM are:

Which is very related to all the issues with storing HTTP requests and responses.

andresriancho commented 10 years ago
andresriancho commented 10 years ago
andresriancho commented 10 years ago
andresriancho commented 10 years ago

The bad thing about codernity is that it doesn't have an ORM, so I would have to write that... maybe SQLAlchemy is not that bad after all :+1:

Should write some performance tests for this...

andresriancho commented 10 years ago

One of the good things I've discovered about Codernity is that it can run as a server:

This sounds interesting because we would be using a different process for handling the DB memory/CPU load and connecting to it via TCP/IP.

Also it sounds interesting for exposing the knowledge base for the REST API #1415 Research: Knowledge base should also be accessible via REST API

andresriancho commented 10 years ago

Peewee looks like a good option http://peewee.readthedocs.org/en/latest/peewee/cookbook.html and has sqlite thread safe tricks

andresriancho commented 10 years ago

Fango irme has some interesting notes on sqlite and threads https://docs.djangoproject.com/en/dev/ref/databases/