CEHFIDA / theharvester

Automatically exported from code.google.com/p/theharvester
GNU General Public License v2.0
1 stars 0 forks source link

Emails appear incorrect when Google results are truncated #19

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Perform a harvester query, for a known organisation
2. Notice that when you attempt to modify the email regex to:
(' ' + '[a-zA-Z0-9.-_]*' + '.' + '[a-zA-Z0-9.-_]*' + '@' + '[a-zA-Z0-9.-]*' + 
self.word)
You will begin to see some results appearing as "... TEST@domain.co.uk"
3. These results are incorrectly being parsed, due to the fact that you are 
creating the results not from the pages, but including truncated google results.

What is the expected output? What do you see instead?

Expected: "Test.TEST@domain.co.uk" - as viewed on webpage.
Actual: "... TEST@domain.co.uk" - From Truncated google result.

What version of the product are you using? On what operating system?
2.2a - Mac OS X

Please provide any additional information below.

I cannot see a fix for this, unless you provide a future command line switch 
e.g. -IF (Investigate further and attempt to curl/ grep the page for the 
corresponding result.)

Original issue reported on code.google.com by Fletcher...@gmail.com on 7 Sep 2014 at 12:22