dkm05midhra / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

JVM crash when running crawler on Centos 6.2 #136

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Running the crawler crashes the JVM some times. I crawl around 10 web sites 
regularly with pages between 1K to 50K. This happens randomly but happens very 
consistently.
I was able to reproduce it with number of threads ranging from 50, 5 and even 1.
Following is the stack trace on a CentOS 6.2, 64 bit server using OpenJDK 1.6 
64 bit

What is the expected output? What do you see instead?
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f92392bef6a, pid=32726, tid=140265909081856
#
# JRE version: 6.0_22-b22
# Java VM: OpenJDK 64-Bit Server VM (20.0-b11 mixed mode linux-amd64 compressed 
oops)
# Derivative: IcedTea6 1.10.4
# Distribution: CentOS release 6.2 (Final), package 
rhel-1.42.1.10.4.el6_2-x86_64
# Problematic frame:
# J  
org.apache.http.client.protocol.RequestAddCookies.process(Lorg/apache/http/HttpR
equest;Lorg/apache/http/protocol/HttpContext;)V
#
# An error report file with more information is saved as:
# /root/agent/hs_err_pid32726.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#

What version of the product are you using?
3.1

Please provide any additional information below.
Was able to reproduce this on another machine running Fedora 16 with Oracle JDK 
7 (Version - 7.0_03-b04). So it does not look like a hardware problem. Note 
that all the guest OS and JVMs are 64 bit.

Original issue reported on code.google.com by mrajesh...@gmail.com on 13 Mar 2012 at 9:34

GoogleCodeExporter commented 9 years ago
I got the same crash on AWS

18:12:53,767 INFO  ~ Indexing page http://www.kidsbooks.com/ [221]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ffc4555fb01, pid=2555, tid=140720758245120
#
# JRE version: 6.0_22-b22
# Java VM: OpenJDK 64-Bit Server VM (20.0-b11 mixed mode linux-amd64 compressed 
oops)
# Derivative: IcedTea6 1.10.6
# Distribution: Amazon Linux BASE release 2012.03, package 
amazon-52.1.10.6.44.amzn1-x86_64
# Problematic frame:
# J  
org.apache.http.client.protocol.RequestAddCookies.process(Lorg/apache/http/HttpR
equest;Lorg/apache/http/protocol/HttpContext;)V
#
# An error report file with more information is saved as:
# /opt/projects/daemons/hs_err_pid2555.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#

Original comment by tahs...@trademango.com on 4 Apr 2012 at 10:10

GoogleCodeExporter commented 9 years ago
Upgrading my JVM from OpenJDK 1.6.0 to Oracle JDK 1.7 seems to have solved this 
problem. This might be a JVM bug?

Original comment by tahs...@trademango.com on 6 Apr 2012 at 12:34

GoogleCodeExporter commented 9 years ago
Actually I will have to eat my own words. The upgrade does not solve the issue 
the jvm is still randomly crashing.

Original comment by tahs...@trademango.com on 6 Apr 2012 at 1:00

GoogleCodeExporter commented 9 years ago
getting the same error on AWS. Any pointers on how to resolve it would be great.
We are also getting the error with the method :
org.apache.http.impl.cookie.BestMatchSpec.formatCookies

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8718aa4b4e, pid=18414, tid=140217624725248
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# J  
org.apache.http.impl.cookie.BestMatchSpec.formatCookies(Ljava/util/List;)Ljava/u
til/List;

Original comment by ipremya...@gmail.com on 1 May 2012 at 10:12

GoogleCodeExporter commented 9 years ago
I got the same crash on AWS. Anybady could help us?

Original comment by boaz.tri...@gmail.com on 5 Jun 2012 at 12:16

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I have this issue but I don't use crawler4j anymore. What is your depth level 
for crawling?

Original comment by sb1...@cs.ship.edu on 6 Jul 2012 at 1:10

GoogleCodeExporter commented 9 years ago
I emailed HttpClient list and gave this forum for reference. I think a solution 
to fixing this bug will be in/near HttpClient class:PageFetcher for you.

Original comment by sb1...@cs.ship.edu on 6 Jul 2012 at 2:25

GoogleCodeExporter commented 9 years ago
**Reference -- http://www.mail-archive.com/dev@nutch.apache.org/msg06169.html

Original comment by sb1...@cs.ship.edu on 6 Jul 2012 at 2:43

GoogleCodeExporter commented 9 years ago
I solved this problem by setting COOKIE_POLICY to 
CookiePolicy.BROWSER_COMPATIBILITY when creating an instance of HttpClient:

/////////////////////////////////////////////////////////////
DefaultHttpClient client= new DefaultHttpClient(ccm);       
client.setHttpRequestRetryHandler(new MyHttpRequestRetryHandler());
client.setKeepAliveStrategy(new MyConnectionKeepAliveStrategy());

HttpParams params = client.getParams();
params.setParameter(ClientPNames.COOKIE_POLICY, 
CookiePolicy.BROWSER_COMPATIBILITY);
params.setBooleanParameter(CoreConnectionPNames.TCP_NODELAY,true);
params.setParameter(CoreConnectionPNames.SO_TIMEOUT, new Integer(timeout));
params.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, new 
Integer(timeout));

//Choose BASIC over DIGEST for proxy authentication
List authpref = new ArrayList();
authpref.add(AuthPolicy.BASIC);
authpref.add(AuthPolicy.DIGEST);
authpref.add(AuthPolicy.NTLM);
authpref.add(AuthPolicy.SPNEGO);
params.setParameter(AuthPNames.PROXY_AUTH_PREF, authpref);

return client;
/////////////////////////////////////////////////////////////

Original comment by crazyr...@126.com on 18 Jul 2012 at 4:56

GoogleCodeExporter commented 9 years ago
for what it's worth, I have the exact same crash on OS X  lion with the oracle 
jvm.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000010e176f07, pid=88677, tid=23811
#
# JRE version: 7.0_07-b10
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.3-b01 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# J  
org.apache.http.client.protocol.RequestAddCookies.process(Lorg/apache/http/HttpR
equest;Lorg/apache/http/protocol/HttpContext;)V

 I wrote my own crawler using httpclient 4.2.1 and it is randomly crashing the JVM on org.apache.http.client.protocol.RequestAddCookies.process.

I'm trying the cookie policy fix and so far it seems to work.

Original comment by jillesva...@gmail.com on 13 Oct 2012 at 2:27

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I'm closing this issue as it should be resolved after this changelist: 
https://code.google.com/p/crawler4j/source/detail?r=3df4ae16409c00d2927880832e4f
e6ae0550a89c

If you get any more repro after this update let me know with details.

-Yasser

Original comment by ganjisaffar@gmail.com on 3 Mar 2013 at 5:54

GoogleCodeExporter commented 9 years ago
This is a JIT compiler bug, apparently running with -XX:-LoopUnswitching avoids 
the problem.

http://bugs.sun.com/view_bug.do?bug_id=8021898

Original comment by d...@ayre.ca on 28 Nov 2013 at 9:14