danwent / Perspectives-Server

network notary implementation for the Perspectives project
http://perspectives-project.org
GNU General Public License v3.0
50 stars 13 forks source link

Refactoring threaded_scanner.py #7

Closed Tie-fighter closed 13 years ago

Tie-fighter commented 13 years ago

I would like to do the following things:

danwent commented 13 years ago

Great to have you participating. A couple questions.

What is the goal of combining the list_service_ids.py functionality with threaded_scanner.py? Its pretty easy to run them in combination if you like as described in the README:

python utilities/list_service_ids.py notary.sqlite all | python threaded_scanner.py notary.sqlite - 10 10

The current approach follows a kind of "unix-like" approach where you have loosely coupled simple utilities that can be chained together to do more advanced things. It means its easy to write a new scanner and add it to the mix, or to write a new way to generating a list of service-ids to scan.

On the second item, a thread pool approach is nice it lets us strictly bound the number of threads that will be created (and thus know that you won't bump into a static limit on the system). The downside is that based on my experience with running large notaries, it actually makes the rate at which a scan progresses less deterministic. I find that most scans complete very quickly, while a minority take a long time, waiting on I/O from the remote host. If one just has a static pool of say 20 threads, that set of 20 threads quickly gets dominated by slow threads which do not consume much CPU because they are just waiting for remote I/O. I'm not sure the best way to marry both of those concerns, but the current approach seems to do a decent job, so I've just left it like that for now :)

Tie-fighter commented 13 years ago

In fact I totally agree with you :) Like I said: it was late and it made perfectly good sense in my head :D Nevermind!