Closed Tie-fighter closed 13 years ago
Great to have you participating. A couple questions.
What is the goal of combining the list_service_ids.py functionality with threaded_scanner.py? Its pretty easy to run them in combination if you like as described in the README:
python utilities/list_service_ids.py notary.sqlite all | python threaded_scanner.py notary.sqlite - 10 10
The current approach follows a kind of "unix-like" approach where you have loosely coupled simple utilities that can be chained together to do more advanced things. It means its easy to write a new scanner and add it to the mix, or to write a new way to generating a list of service-ids to scan.
On the second item, a thread pool approach is nice it lets us strictly bound the number of threads that will be created (and thus know that you won't bump into a static limit on the system). The downside is that based on my experience with running large notaries, it actually makes the rate at which a scan progresses less deterministic. I find that most scans complete very quickly, while a minority take a long time, waiting on I/O from the remote host. If one just has a static pool of say 20 threads, that set of 20 threads quickly gets dominated by slow threads which do not consume much CPU because they are just waiting for remote I/O. I'm not sure the best way to marry both of those concerns, but the current approach seems to do a decent job, so I've just left it like that for now :)
In fact I totally agree with you :) Like I said: it was late and it made perfectly good sense in my head :D Nevermind!
I would like to do the following things: