Open GoogleCodeExporter opened 8 years ago
I believe there's an issue in the email extraction regex:
myparser.py
def emails(self):
self.genericClean()
reg_emails = re.compile('[a-zA-Z0-9.-_]*' + '@' + '[a-zA-Z0-9.-]*' + self.word)
self.temp = reg_emails.findall(self.results)
emails=self.unique()
return emails
The regex contains [a-zA-Z0-9.-_]* - note the .-_ at the end. Shouldn't the
hyphen be escaped?
http://stackoverflow.com/questions/9589074/regex-should-hyphens-be-escaped
I tested this with your issue and it seemed to fix it
[+] Emails found:
------------------
gifford@gfong.com
clientservices@gfong.com
steve.fong@gfong.com
jessie.zhang@gfong.com
gfacareers@gfong.com
christine@gfong.com
gifford.fong@gfong.com
mohini@gfong.com
rong@gfong.com
gary@gfong.com
rfan@gfong.com
jquinn@gfong.com
Original comment by marcus.w...@loumiaconsulting.com
on 4 Jan 2014 at 11:13
Original issue reported on code.google.com by
hakon.kr...@gmail.com
on 10 Apr 2013 at 8:15