eclipse-ee4j / glassfish

Eclipse GlassFish
https://eclipse-ee4j.github.io/glassfish/
378 stars 144 forks source link

Simulating load traffic onto port 4848 or 8080 hangs PE #677

Closed glassfishrobot closed 17 years ago

glassfishrobot commented 18 years ago

Hi,

This consistently happens with both 9.1 distros (Glass/Sun).

If I issue an Apache Benchmark:

ab -c 100 -n 10000 http://:4848/ or

ab -c 10 -n 1000 http://:8080/HelloImpl/HelloImplService?Tester (the tutorial pojo web service)

to simulate some traffic load, after a few of these ab calls the server hangs and the only mean to resort it is to shutdown and startup.

Before the kill/restart the server system shows:

robi@6[bin]$ netstat -na | grep CLOSE tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32958 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32959 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32956 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32957 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32954 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32955 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32952 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32953 CLOSE_WAIT tcp6 1 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32950 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32951 CLOSE_WAIT tcp6 1 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32948 CLOSE_WAIT tcp6 1 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32949 CLOSE_WAIT tcp6 1 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32946 CLOSE_WAIT tcp6 1 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32947 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32964 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32965 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32962 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32963 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32960 CLOSE_WAIT tcp6 127 0 ::ffff:192.168.1.2:8080 ::ffff:192.168.1.:32961 CLOSE_WAIT

I googled for some open bug and saw that there were:

Bug ID: 6415256 and Bug ID: 6324680

that were supposed to keep track of some similar problem, and that they are now closed.

Java Version is: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode)

Kernel is 2.6.10: robi@6[bin]$ uname -a Linux xp2012 2.6.10 #1 Wed Feb 23 16:54:53 EST 2005 i686 GNU/Linux

Don't think it's a problem of JDK (since Tomcat works nicly) and don't think is a problem of my Linux Box (I have ACE/TAO, Omniorb and other CORBAs work nicely with no persistent CLOSE_WAITs).

Hope to be of help,

Alla prossima, zum nexten Mal, à la prochaine, see You,

Roberto M

Environment

Operating System: Linux Platform: Linux

Affected Versions

[9.1pe]

glassfishrobot commented 6 years ago
glassfishrobot commented 18 years ago

@glassfishrobot Commented jluehe said: Reassigning ...

glassfishrobot commented 18 years ago

@glassfishrobot Commented jfarcand said: Hum....my setup is:

[ja120114@localhost ja120114]$ uname -an Linux localhost.localdomain 2.4.20-18.8 #1 Thu May 29 07:40:27 EDT 2003 i686 i686 i386 GNU/Linux

My VM is:

[ja120114@localhost ja120114]$ java -version java version "1.5.0_04" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_04-b05) Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode, sharing)

And the result I'm getting is:

[ja120114@localhost bootstrap]$ ab -c 100 -n 10000 http://localhost:4848/ This is ApacheBench, Version 2.0.40-dev <$Revision: 1.116 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Finished 10000 requests

Server Software: Sun Server Hostname: localhost Server Port: 4848

Document Path: / Document Length: 2227 bytes

Concurrency Level: 100 Time taken for tests: 573.75173 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 25170000 bytes HTML transferred: 22270000 bytes Requests per second: 17.45 /sec (mean) Time per request: 5730.752 [ms] (mean) Time per request: 57.308 [ms] (mean, across all concurrent requests) Transfer rate: 42.89 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 57 416.6 0 8997 Processing: 4 5630 1125.5 5999 11999 Waiting: 3 3041 499.4 3001 9002 Total: 4 5687 1074.3 5999 12000

Percentage of the requests served within a certain time (ms) 50% 5999 66% 5999 75% 5999 80% 6000 90% 6000 95% 6000 98% 6000 99% 6000 100% 12000 (longest request)

and doing netstat -an | grep CLOSE

[ja120114@localhost ja120114]$ netstat -an | grep CLOSE | wc -l 0

Now you can add and see if that makes a difference:

-Dcom.sun.enterprise.server.ss.ASQuickStartup=false

I'm not closing the bug yet, as I will try with the same VM/same kernel, but I doubt there is a bug here You most probably have a configuration problem. GlassFish use most memory than Tomcat (unfortunatly, but we are working on it). Note that by default GlassFish use 5 threads. You might want to increase than value in domain.xml by increasing the thread-count value of:

<request-processing header-buffer-length-in-bytes="4096" initial-thread-count="2" request-timeout-in-seconds="30" thread-count="5" thread-increment="1"/>

to 20 or 30.

glassfishrobot commented 18 years ago

@glassfishrobot Commented jfarcand said: Ok after weeks of email exchanges, I'm bumping the priority to a p3 because the submitter showed logs where there is clearly a problem. I really suspect a VM bug when an http flood is happenning. I wasn't able to reproduce the problem with kerne:

Linux localhost.localdomain 2.4.20-18.8 #1 Thu May 29 07:40:27 EDT 2003 i686 i686 i386 GNU/Linux

and thread % getconf GNU_LIBPTHREAD_VERSION linuxthreads-0.10

with 5.0_04 and up, and 6.0 beta2 and up

We need to find a machine available inside (or outside) that can reproduce the problem.

If there is a problem, this must be ported back to 9.0 ur1

glassfishrobot commented 18 years ago

@glassfishrobot Commented jfarcand said: Some explanation that might explain part of the problem (from Alan Bateman)

On Linux, the dup2 system call does not close the connection if the file descriptor is registered in a poll array and a thread is blocked in poll. The connection will be closed when the thread doing a poll wakes up but it can look like a leak. There isn't a solution to this issue - except to update to Java SE 6 and use the Linux 2.6 kernel. We also have a solution in 5.0u9 but again it requires the 2.6 kernel.

But it seems the problem is also reproducable with 6.0.

Still looking for a reproducable test case. Make sure it will be evaluated for the 9.0ur1 release.

glassfishrobot commented 18 years ago

@glassfishrobot Commented jfarcand said: Our SQE team isn't able to reproduce it, so remove the 9.0ur1 for now.

glassfishrobot commented 17 years ago

@glassfishrobot Commented gfbugbridge said:

glassfishrobot commented 17 years ago

@glassfishrobot Commented gfbugbridge said:

glassfishrobot commented 17 years ago

@glassfishrobot Commented jfarcand said: Checking in uport/PortUnificationPipeline.java; /cvs/glassfish/appserv-http-engine/src/java/com/sun/enterprise/web/portunif/PortUnificationPipeline.java,v <-- PortUnificationPipeline.java new revision: 1.10; previous revision: 1.9 done Checking in grizzly/GrizzlyHttpProtocol.java; /cvs/glassfish/appserv-http-engine/src/java/com/sun/enterprise/web/connector/grizzly/GrizzlyHttpProtocol.java,v <-- GrizzlyHttpProtocol.java new revision: 1.37; previous revision: 1.36 done

glassfishrobot commented 17 years ago

@glassfishrobot Commented jfarcand said: Wrong bug.

glassfishrobot commented 17 years ago

@glassfishrobot Commented jfarcand said: As of 01/04/07, I wasn't able to reproduce the problem with ab or jmeter or our internal stressing tool. Closing the bug as not reproducible. Tested with build 30. Please re-open if you are still seeing the problem.

glassfishrobot commented 18 years ago

@glassfishrobot Commented Was assigned to jfarcand

glassfishrobot commented 7 years ago

@glassfishrobot Commented This issue was imported from java.net JIRA GLASSFISH-677

glassfishrobot commented 18 years ago

@glassfishrobot Commented Reported by robertom

glassfishrobot commented 17 years ago

@glassfishrobot Commented Marked as cannot reproduce on Thursday, January 4th 2007, 2:10:13 am