amrdipo / theharvester

Automatically exported from code.google.com/p/theharvester
GNU General Public License v2.0
0 stars 0 forks source link

Email addresses inside [] brackets will be listed as starting with [ (left bracket) #10

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. theharvester.py -d gfong.com -l 50 -b google

What is the expected output?

gifford@gfong.com
christine@gfong.com
jessie.zhang@gfong.com
steve.fong@gfong.com
gifford.fong@gfong.com
rong@gfong.com

What do you see instead?

gifford@gfong.com
[christine@gfong.com
[jessie.zhang@gfong.com
steve.fong@gfong.com
gifford.fong@gfong.com
rong@gfong.com

What version of the product are you using? On what operating system?

2.2a on Windows 7

Please provide any additional information below.

The problem occurs when an email-address is put inside square brackets.
This is common in mail exports, like this:
From: John Doe [jdoe@live.com]
Sent: Fri 13, 1337
To: D. Evil [devil@yahoo.com]

Saw an earlier bug fix, removing preceding @ from addresses. Probably the same 
issue? It should anyhow be easy to filter out the square brackets.

Original issue reported on code.google.com by hakon.kr...@gmail.com on 10 Apr 2013 at 8:15

GoogleCodeExporter commented 8 years ago
I believe there's an issue in the email extraction regex:
myparser.py 

def emails(self):
    self.genericClean()
    reg_emails = re.compile('[a-zA-Z0-9.-_]*' + '@' + '[a-zA-Z0-9.-]*' + self.word)
    self.temp = reg_emails.findall(self.results)
    emails=self.unique()
    return emails

The regex contains [a-zA-Z0-9.-_]* - note the .-_ at the end. Shouldn't the 
hyphen be escaped?

http://stackoverflow.com/questions/9589074/regex-should-hyphens-be-escaped

I tested this with your issue and it seemed to fix it

[+] Emails found:
------------------
gifford@gfong.com
clientservices@gfong.com
steve.fong@gfong.com
jessie.zhang@gfong.com
gfacareers@gfong.com
christine@gfong.com
gifford.fong@gfong.com
mohini@gfong.com
rong@gfong.com
gary@gfong.com
rfan@gfong.com
jquinn@gfong.com

Original comment by marcus.w...@loumiaconsulting.com on 4 Jan 2014 at 11:13