What steps will reproduce the problem?
1. Create a web-page with a malformed URL (or a protocol like mailto:)
2. Run the crawler on said website.
3. Crash and burn at line 89 in WebURL.java - this IndexOutOfBounds exception
completely breaks the crawl. It should probably throw a silent exception and
catch or log it. I would strongly suggest using the java.net.URL parser as
opposed to your custom solution.
What is the expected output? What do you see instead?
I would expect an exception to be thrown somewhere inside public void
setURL(String url) and the crawl should not fail completely.
What version of the product are you using?
3.3
Please provide any additional information below.
Original issue reported on code.google.com by david.titarenco on 5 Jul 2012 at 10:51
Original issue reported on code.google.com by
david.titarenco
on 5 Jul 2012 at 10:51