asepaprianto / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

How to get http response #162

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
how to get http response of the links which are being crawled, so that one can 
know while crawling some link was broken.

What is the expected output? What do you see instead?
In case of broken link : 404

What version of the product are you using?
crawler4j 3.3

Original issue reported on code.google.com by chhabraa...@gmail.com on 26 Jun 2012 at 5:39

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Got it. Overriding the function "handlePageStatusCode" in WebCrawler.java gives 
status code .

Thank you for a great software. !!

Original comment by chhabraa...@gmail.com on 27 Jun 2012 at 5:55

GoogleCodeExporter commented 9 years ago
i am running the crawler to get the status codes of the urls.For that am 
Overriding the function "handlePageStatusCode" in WebCrawler.java.But for some 
urls i am getting the status code like 1005.If i run the same crawl again i am 
not getting 1005 .Instead of 1005 it is coming as 200.I want to avoid the 1005 
status code...Please help me.
i am using 3.5 version.

Original comment by suresh.a...@gmail.com on 22 Oct 2013 at 9:43

GoogleCodeExporter commented 9 years ago
Please supply example urls so I can test it for myself

Original comment by avrah...@gmail.com on 11 Aug 2014 at 2:07

GoogleCodeExporter commented 9 years ago
StatusCOde 1005 is a general http error which sometimes states that the server 
is down.

I assume that when you get 200 as status code the server is up and when you get 
the 1005 (did you set politeness to 0 ?!) the server is down.

Original comment by avrah...@gmail.com on 19 Aug 2014 at 9:18

GoogleCodeExporter commented 9 years ago
This issue is solved.

For any further discussion please open a thread in the forum.

Original comment by avrah...@gmail.com on 19 Aug 2014 at 9:19