What steps will reproduce the problem?
1.I run this script in China:
> metagoofil.py -d swu.edu.cn -t doc -l 20 -n 20 -o test -f test.html
output:
******************************************************
* /\/\ ___| |_ __ _ __ _ ___ ___ / _(_) | *
* / \ / _ \ __/ _` |/ _` |/ _ \ / _ \| |_| | | *
* / /\/\ \ __/ || (_| | (_| | (_) | (_) | _| | | *
* \/ \/\___|\__\__,_|\__, |\___/ \___/|_| |_|_| *
* |___/ *
* Metagoofil Ver 2.2 *
* Christian Martorella *
* Edge-Security.com *
* cmartorella_at_edge-security.com *
******************************************************
['doc']
[-] Starting online search...
[-] Searching for doc files, with a limit of 20
Searching 100 results...
Results: 0 files found
Starting to download 20 of them:
----------------------------------------
processing
user
email
[+] List of users found:
--------------------------
[+] List of software found:
-----------------------------
[+] List of paths and servers found:
---------------------------------------
[+] List of e-mails found:
----------------------------
2. I tried to modify the file: discovery/googlesearch.py
change:
self.server="www.google.com"
self.hostname="www.google.com"
to:
self.server="www.google.com.hk"
self.hostname="www.google.com.hk"
Re-run step1,
output:
....
['doc']
[-] Starting online search...
[-] Searching for doc files, with a limit of 20
_
This time, the screen output to stop in here and can not continue to go down.(I
do not know if you can understand, i'm sorry for my poor English!)
I debugged the code and found this script execution is blocked here,i don't
know what's happen
discovery/googlesearch.py:27 self.results = h.getfile().read()
It looks like google to return too many results
3.so I adjusted the page size parameter:
discovery/googlesearch.py:16
self.quantity="100" ===> self.quantity="10"
discovery/googlesearch.py:46
self.counter+=100 ===> self.counter+=10
and I also modified this point
discovery/googlesearch.py:27
self.results = h.getfile().read()
h.close() #Add this sentence seems to be useful
Re-run step1, Sometimes it works, sometimes the same as before
What is the expected output? What do you see instead?
it does not work very well
What version of the product are you using? On what operating system?
metagoofil 2.2 windows7 python2.6
Please provide any additional information below.
if the script run successfully, the results file path list contains some
errors:
eg :
[1/20] /webhp?hl=en-HK
[x] Error downloading /webhp?hl=en-HK
[12/20] /support/websearch/bin/answer.py?answer=134479
[x] Error downloading /support/websearch/bin/answer.py?answer=134479
....
my solution is :
myparser.py:43
#reg_urls = re.compile('<a href="(.*?)"')
reg_urls = re.compile('<a href="[^">]*?/url\?q=([^">]*?)&sa=U.*?"')
The result looks no problem, I do not know any other way, I do not want to change it again.
Original issue reported on code.google.com by c13...@gmail.com on 23 Sep 2013 at 2:45
Original issue reported on code.google.com by
c13...@gmail.com
on 23 Sep 2013 at 2:45