asepaprianto / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

All links on a page are not recognized #340

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Basically I am also facing a problem where crowler4j do not recognize all links 
on the page.

say for example there are 5 links existing on the page out of them only 3 gets 
recognized and hence fetched. Rest two are not even recognized.

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?
All the links in a page shall be recognized so that they can be fetched

What version of the product are you using?
4.1

Please provide any additional information below.
Only difference I found in the links which are not recognized is that these 
links has angled bracket in it.

ex. 

<a title="some text" 
href="http://www.example.com/abc/xyz-<near>-abc-xyz/abc_xyz" >some text</a>

Original issue reported on code.google.com by amarvyaw...@gmail.com on 9 May 2015 at 12:47