eclipse-ee4j / glassfish

Eclipse GlassFish
https://eclipse-ee4j.github.io/glassfish/
385 stars 144 forks source link

[BLOCKING] AIX: Too many open files exception #16672

Closed glassfishrobot closed 12 years ago

glassfishrobot commented 13 years ago

This may be a matter of documenting tuning of AIX for Glassfish application server, however, I'm logging it first under Admin GUI component to verify that we are not leaking file descriptors.

Here is the issue I encountered:

After clicking Launch link on Applications page got the following exception was displayed in new window:

HTTP Status 500 -

type Exception report

message

descriptionThe server encountered an internal error () that prevented it from fulfilling this request.

exception

javax.servlet.ServletException: java.lang.reflect.InvocationTargetException while attempting to process a 'beforeCreate' event for 'page1'.

root cause

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException while attempting to process a 'beforeCreate' event for 'page1'.

root cause

java.lang.reflect.InvocationTargetException

root cause

java.lang.RuntimeException: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: Connection reset

root cause

com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: Connection reset

root cause

java.net.SocketException: Connection reset

The server.log had an additional exception:

[#|2011-05-17T15:09:59.672-0700|WARNING|oracle-glassfish3.1|com.sun.grizzly.conf ig.GrizzlyServiceListener|_ThreadID=5;_ThreadName=Thread-6;|GRIZZLY0006: Excepti on accepting channel java.io.IOException: Too many open files

I still clicked around Administration Console and after clicking Applications link, got the following exception in the Admin Console window, right frame:

HTTP Status 500 -

type Exception report

message

descriptionThe server encountered an internal error () that prevented it from fulfilling this request.

exception

javax.servlet.ServletException: java.lang.RuntimeException while attempting to process a 'beforeEncode' event for 'event157'.

root cause

java.lang.RuntimeException: java.lang.RuntimeException while attempting to process a 'beforeEncode' event for 'event157'.

root cause

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException while attempting to process a 'beforeEncode' event for 'event157'.

root cause

java.lang.reflect.InvocationTargetException

root cause

com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: Too many open files

root cause

java.net.SocketException: Too many open files

I found the following information on tuning AIX (at http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/tprf_tuneaix.html)::)

  1. AIX file descriptors (ulimit)

nofiles = 10000 nofiles_hard = 10000

2. Save the changes. 3. Restart your AIX system. 4. To verify the result, type the ulimit -a command on the command line. For example, type # ulimit -a.

Increasing the ulimit file descriptor limits might improve performance. Increasing some of the other limits might be needed depending on your application. Any changes to the data or stack ulimits should ensure that data+stack < 256MB (for 32-bit WebSphere Application Server only).

It is recommended that you change the ulimit for data to "unlimited".

Some of the above may be applicable to Glassfish.

Environment

Glassfish server 3.1.1 b4 and b5 on AIX 6.1

Affected Versions

[3.1]

glassfishrobot commented 6 years ago
glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: Cannot reproduce this issue.

glassfishrobot commented 13 years ago

@glassfishrobot Commented anilam said: Paste email discussion so it won't get lost. We don't know what exact sequence will reproduce this problem, and how does it help user on documenting this. Transfer this back to Lidia, she can close this as not-reproducible, or transfer to doc with what she believes should be documented.

On 5/18/11 1:07 PM, Lidia Marchioni wrote:

Hi Siraj

What are different actions that can cause this? I restarted domain2, and I'm guessing that the issue is no longer seen because of this. Any idea how I can reproduce this condition? It was not pretty once it happened.

I don't know what specific sequence you need to go through to reproduce this. In general, going through OS settings and documentation should give an idea on the file descriptor limit.

I think that in the least we should document changing the file descriptor limit as mentioned in the web sphere doc. Do you agree?

May be. I don't have a lot of experience with AIX. Lidia

On 5/18/2011 9:40 AM, Siraj Ghaffar wrote:

Hi Lidia, I cannot reproduce the issue. Things work fine on the machine you pointed me to. In general, the behavior you saw might be caused by different factors, not necessary related to the sequence you reported in the issue.

Thanks --Siraj

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: I am again seeing this issue. I have 2 domains on the machine with the following processes running since Friday afternoon:

1. domain1 a. DAS b. standalone instance c. cluster with 2 instances

2. domain2 a. DAS b. cluster with 2 instances

I brought up Admin Console for domain1. Clicked Standalone Instances, in1 and got the following error displayed:

HTTP Status 500 -

type Exception report

message

descriptionThe server encountered an internal error () that prevented it from fulfilling this request.

exception

javax.servlet.ServletException: java.lang.RuntimeException while attempting to process a 'beforeEncode' event for 'event155'.

root cause

java.lang.RuntimeException: java.lang.RuntimeException while attempting to process a 'beforeEncode' event for 'event155'.

root cause

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException while attempting to process a 'beforeEncode' event for 'event155'.

root cause

java.lang.reflect.InvocationTargetException

root cause

com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: Too many open files

root cause

java.net.SocketException: Too many open files

note The full stack traces of the exception and its root causes are available in the Oracle GlassFish Server 3.1.1 logs.

It seems to me that in order to reproduce this issue, one needs to start a few Glassfish server instances, preferably on two domains, deploy a couple of apps and leave the system running for a day. However, I cannot determine why we are getting into this situation. It would be great to talk to an AIX expert regarding this issue. Following is information I gathered.

I checked Glassfish processes and number of file descriptors they use (via ls /proc//fd), here is the output:

1. domain1

a. DAS bash-3.2# ls -l /proc/266422/fd | wc 1946 17507 121462

b. in1 bash-3.2# ls -l /proc/327882/fd | wc 526 4727 32453

c. cl1in1 bash-3.2# ls -l /proc/331950/fd | wc 527 4736 32515

d. cl1in2 bash-3.2# ls -l /proc/323772/fd | wc 527 4736 32515

2. domain2

a. DAS bash-3.2# ls -l /proc/290972/fd | wc 1573 14150 97968

b. cl1in1 bash-3.2# ls -l /proc/307386/fd | wc 527 4736 32515

c. cl1in2 bash-3.2# ls -l /proc/311488/fd | wc 527 4736 32515

I've researched it a little further and we are running out of TCP sockets and not file handles:

Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: Too many open files at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle (URLConnectionClientHandler.java:131) at com.sun.jersey.api.client.Client.handle(Client.java:629) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:601) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:45 9) at org.glassfish.admingui.common.util.RestUtil.get(RestUtil.java:668) at org.glassfish.admingui.common.util.RestUtil.restRequest(RestUtil.java :185) at org.glassfish.admingui.common.handlers.RestApiHandlers.restRequest(Re stApiHandlers.java:210) ... 50 more Caused by: java.net.SocketException: Too many open files at java.net.Socket.createImpl(Socket.java:407) at java.net.Socket.connect(Socket.java:537) at java.net.Socket.connect(Socket.java:488) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:407) at sun.net.www.http.HttpClient.openServer(HttpClient.java:542) at sun.net.www.http.HttpClient.(HttpClient.java:246) at sun.net.www.http.HttpClient.New(HttpClient.java:319) at sun.net.www.http.HttpClient.New(HttpClient.java:336) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLC onnection.java:980) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConne ction.java:865) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection .java:846) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon nection.java:1182) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:390 ) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invok e(URLConnectionClientHandler.java:217) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle (URLConnectionClientHandler.java:129) ... 57 more

I looked at port use on the system and found the following:

total for localhost: bash-3.2# netstat -a | grep localhost | wc 350 2100 27294

total in close wait state: bash-3.2# netstat -a | grep CLOSE_WAIT | wc 346 2076 26988

total for DAS2 (port 5858): bash-3.2# netstat -a | grep 5858 |wc 311 1866 24254

Out of the above domain2 DAS ports, 310 are in CLOSE_WAIT state: bash-3.2# netstat -a | grep 5858 | grep CLOSE_WAIT |wc 310 1860 24180

Thus, it looks to me that in order to reproduce this issue, it should be enough to create a second domain and start it. Not certain if accessing it via Admin Console is needed. I'll attach file with netstat output.

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: After restarting domain2, all CLOSE_WAIT ports got released and only one was in use, as expected:

bash-3.2# netstat -a | grep 5858 tcp 0 0 *.5858 .** LISTEN

Loading Admin Console for domain2 from a Windows XP machine:

bash-3.2# netstat -a | grep 5858 tcp4 0 0 aixas12.us.oracl.5858 aixas12.us.oracl.40176 ESTABLISHED tcp 0 0 aixas12.us.oracl.40176 aixas12.us.oracl.5858 ESTABLISHED tcp 0 0 *.5858 .** LISTEN tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.semap ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.cpqrp ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.cpqrp ESTABLISHED

After Admin Console is loaded:

bash-3.2# netstat -a | grep 5858 tcp4 0 0 localhost.5858 localhost.40196 ESTABLISHED tcp 0 0 localhost.40196 localhost.5858 ESTABLISHED tcp 0 0 *.5858 .** LISTEN tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.semap ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.cpqrp ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.cpqrp ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.msft- ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.veris ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.csoft ESTABLISHED

From here on the use of sockets goes up quickly: a couple clicks in Admin Console:

bash-3.2# netstat -a | grep 5858 tcp4 0 0 localhost.5858 localhost.40196 FIN_WAIT_2 tcp 0 0 localhost.40196 localhost.5858 CLOSE_WAIT tcp4 0 0 localhost.5858 localhost.40223 ESTABLISHED tcp 0 0 localhost.40223 localhost.5858 ESTABLISHED tcp4 0 0 localhost.5858 localhost.40224 ESTABLISHED tcp 0 0 localhost.40224 localhost.5858 ESTABLISHED tcp4 0 0 localhost.5858 localhost.40225 ESTABLISHED tcp 0 0 localhost.40225 localhost.5858 ESTABLISHED tcp4 0 0 localhost.5858 localhost.40227 ESTABLISHED tcp 0 0 localhost.40227 localhost.5858 ESTABLISHED tcp 0 0 localhost.40228 localhost.5858 TIME_WAIT tcp 0 0 *.5858 .** LISTEN tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.aicc- ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.vnsst ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.3322 ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.3323 ESTABLISHED tcp4 0 0 aixas12.us.oracl.5858 jed-asqe-42.us.o.3324 ESTABLISHED

here is the count for the above:

bash-3.2# netstat -a | grep 5858 | wc 11 66 854

Now, I expanded Standalone Instances icon (nothing loaded) and clicked on standalone instance name (new page loaded). The count increased to 21:

bash-3.2# netstat -a | grep 5858 | wc 21 126 1644

This does not happen for domain1. Here are sockets open during Admin Console loading for domain1:

bash-3.2# netstat -a | grep 4848 tcp 0 0 *.24848 .** LISTEN bash-3.2#

during loading of Applications page:

bash-3.2# netstat -a | grep 4848 tcp 0 0 *.24848 .** LISTEN tcp4 0 0 localhost.24848 localhost.40549 ESTABLISHED tcp 0 0 localhost.40549 localhost.24848 ESTABLISHED

once the page is loaded:

bash-3.2# netstat -a | grep 4848 tcp 0 0 *.24848 .** LISTEN bash-3.2#

loading standalone instance page:

bash-3.2# netstat -a | grep 4848 tcp 0 0 *.24848 .** LISTEN tcp 0 0 localhost.40618 localhost.24848 TIME_WAIT

loading done:

bash-3.2# netstat -a | grep 4848 tcp 0 0 *.24848 .** LISTEN bash-3.2#

In the meantime (after some 10 - 15 minutes of idle time for Admin Console on domain2) the following sockets are still in use by domain2:

bash-3.2# netstat -a | grep 5858 tcp 0 0 localhost.40196 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40223 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40224 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40225 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40227 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40263 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40265 localhost.5858 CLOSE_WAIT tcp 0 0 *.5858 .** LISTEN bash-3.2#

One more note, after restarting domain2, I still could not get Admin Console working properly for domain1 - it was still displaying Too many files exception for every screen I clicked. This issue only went away after I restarted domain1 as well.

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: Anissa, I'm reassigning this issue to you for investigation. Ports are not released after Admin Console is accessed for domain2. Consequently user runs out of file descriptors on the AIX system, if another non default domain is in use.

I tried also CLI, list applications command, but it did not increase socket use like Admin Console:

before making call to asadmin:

bash-3.2# netstat -a | grep 5858 | wc 8 48 620

Executed: bash-3.2# asadmin --port 5858 list-applications hello clusterjsp <ear, web> Command list-applications executed successfully.

Now netstat shows: bash-3.2# netstat -a | grep 5858 | wc 9 54 697

and shortly after: bash-3.2# netstat -a | grep 5858 |wc 8 48 620

So the additional socket got released properly.

glassfishrobot commented 13 years ago

@glassfishrobot Commented anilam said: Thanks Lidia for providing all the data and inform. Let me seek help from Siraj again. Maybe we need the admin team to take a look too.

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: Are you able to reproduce this by running other web applications?

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: I'm not sure I understand the question. I did not see excessive socket use reported by netstat for any other port number but 5858, domain2 DAS. I have applications (was, ear) deployed to both, instances/clusters on domain1 and domain2. The only port that had hundreds of CLOSE_WAIT connections open was 5858.

I have just reloaded Admin Console for domain2 and went to Standalone Instance page, Applicaions, page: Targets and General page. I checked netstat and socket use is already at 88:

bash-3.2# netstat -a | grep 5858 |wc 88 528 6940

Now I clicked Launch button for hello.war and checked socket use, it went up to 138:

bash-3.2# netstat -a | grep 5858 |wc 138 828 10766

In the meantime I'm checking standalone instance socket use, where hello.war is deployed:

bash-3.2# netstat -a | grep 28083 tcp 0 0 *.28083 .** LISTEN

After accessing the hello app on this instance:

bash-3.2# netstat -a | grep 28083 tcp 0 0 *.28083 .** LISTEN tcp4 0 0 aixas12.us.oracl.28083 jed-asqe-42.us.o.sf-lm ESTABLISHED bash-3.2#

after playing with the app:

bash-3.2# netstat -a | grep 28083 tcp 0 0 *.28083 .** LISTEN tcp4 0 0 aixas12.us.oracl.28083 jed-asqe-42.us.o.sf-lm ESTABLISHED

A couple minutes later:

bash-3.2# netstat -a | grep 28083 tcp 0 0 *.28083 .** LISTEN

Thus it looks like Admin Console is doing something special. I'll send you an email with connection information, so you can debug this issue.

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: I tried it also on a solaris system. Here is the use of sockets during Admin Console load time:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 *.5858 .** 0 0 49152 0 LISTEN biffy.5858 jed-asqe-42.us.oracle.com.2051 65078 0 49640 0 ESTABLISHED localhost.52226 localhost.5858 49152 0 49152 0 ESTABLISHED localhost.5858 localhost.52226 49152 0 49322 0 ESTABLISHED biffy.5858 jed-asqe-42.us.oracle.com.2052 65535 113 49640 0 ESTABLISHED

Most of them are released later:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 *.5858 .** 0 0 49152 0 LISTEN localhost.52229 localhost.5858 49152 0 49152 0 CLOSE_WAIT

When clicking around Admin Console of domain2 on solaris, socket use also increases:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 biffy.5858 jed-asqe-42.us.oracle.com.2682 65535 0 49640 0 ESTABLISHED localhost.52394 localhost.5858 49152 0 49152 0 TIME_WAIT biffy.5858 jed-asqe-42.us.oracle.com.2684 65535 0 49640 0 ESTABLISHED biffy.5858 jed-asqe-42.us.oracle.com.2685 65535 0 49640 0 ESTABLISHED biffy.5858 jed-asqe-42.us.oracle.com.2686 65535 0 49640 0 ESTABLISHED biffy.5858 jed-asqe-42.us.oracle.com.2687 64508 0 49640 0 ESTABLISHED biffy.5858 jed-asqe-42.us.oracle.com.2688 65535 0 49640 0 ESTABLISHED localhost.52395 localhost.5858 49152 0 49152 0 TIME_WAIT localhost.52398 localhost.5858 49152 0 49152 0 TIME_WAIT localhost.52399 localhost.5858 49152 0 49152 0 TIME_WAIT localhost.52400 localhost.5858 49152 0 49152 0 TIME_WAIT localhost.52401 localhost.5858 49152 0 49152 0 TIME_WAIT localhost.52403 localhost.5858 49152 0 49152 0 ESTABLISHED localhost.5858 localhost.52403 49152 0 49152 0 ESTABLISHED *.5858 .** 0 0 49152 0 LISTEN localhost.52229 localhost.5858 49152 0 49152 0 CLOSE_WAIT

Clicking around domain1 Admin Console also uses many sockets:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 4848 .4848 . 0 0 49152 0 LISTEN biffy.4848 jed-asqe-42.us.oracle.com.2674 64898 0 49640 0 TIME_WAIT localhost.52389 localhost.4848 49152 0 49152 0 TIME_WAIT biffy.4848 jed-asqe-42.us.oracle.com.2675 64194 0 49640 0 TIME_WAIT biffy.4848 jed-asqe-42.us.oracle.com.2676 65535 0 49640 0 TIME_WAIT biffy.4848 jed-asqe-42.us.oracle.com.2677 65535 0 49640 0 TIME_WAIT biffy.4848 jed-asqe-42.us.oracle.com.2678 64194 0 49640 0 TIME_WAIT localhost.52390 localhost.4848 49152 0 49152 0 TIME_WAIT localhost.52391 localhost.24848 49152 0 49152 0 TIME_WAIT biffy.4848 jed-asqe-42.us.oracle.com.2699 64194 0 49640 0 ESTABLISHED localhost.52406 localhost.4848 49152 0 49152 0 ESTABLISHED localhost.4848 localhost.52406 49152 0 49152 0 ESTABLISHED biffy.4848 jed-asqe-42.us.oracle.com.2700 64194 0 49640 0 ESTABLISHED biffy.4848 jed-asqe-42.us.oracle.com.2701 65535 0 49640 0 ESTABLISHED biffy.4848 jed-asqe-42.us.oracle.com.2702 65535 0 49640 0 ESTABLISHED localhost.52407 localhost.4848 49152 0 49152 0 TIME_WAIT localhost.52408 localhost.24848 49152 0 49152 0 ESTABLISHED localhost.24848 localhost.52408 49152 0 49152 0 ESTABLISHED .24848 . 0 0 49152 0 LISTEN

However, after a minute or two of inactivity, domain2 sockets are mostly released:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 *.5858 .** 0 0 49152 0 LISTEN localhost.52229 localhost.5858 49152 0 49152 0 CLOSE_WAIT biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains#

and domain1:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 4848 .4848 . 0 0 49152 0 LISTEN .24848 . 0 0 49152 0 LISTEN biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains#

Here is the socket use before and after clicking Launch link for hello.war application on the solaris system:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 |wc 47 329 3747 biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 |wc 110 770 8777

A couple minutes later the use goes down to 33:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 |wc 33 231 2573

and most of them are in TIME_WAIT state:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 localhost.52491 localhost.5858 49152 0 49152 0 CLOSE_WAIT localhost.5858 localhost.52491 49152 0 49152 0 FIN_WAIT_2 localhost.5858 localhost.52560 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52561 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52562 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52563 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52564 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52565 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52566 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52567 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52568 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52569 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52570 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52571 49152 0 49152 0 TIME_WAIT localhost.5858 localhost.52572 49152 0 49152 0 TIME_WAIT ... *.5858 .** 0 0 49152 0 LISTEN

A couple more minutes of inactivity on solaris frees most of the sockets:

biffy(j2eetest):/export/home/j2eetest/v3.1/glassfish3/glassfish/domains# netstat -a | grep 5858 localhost.52491 localhost.5858 49152 0 49152 0 CLOSE_WAIT localhost.5858 localhost.52491 49152 0 49152 0 FIN_WAIT_2 *.5858 .** 0 0 49152 0 LISTEN

while on AIX, that has been idle for an hour now, there are still 103 sockets in use, most of them in CLOSE_WAIT state:

bash-3.2# netstat -a | grep 5858 |wc 103 618 8030

bash-3.2# netstat -a | grep 5858 tcp 0 0 localhost.40196 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40223 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40224 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40225 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40227 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.40263 localhost.5858 CLOSE_WAIT ... tcp 0 0 localhost.40967 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.41162 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.41165 localhost.5858 CLOSE_WAIT tcp 0 0 localhost.41166 localhost.5858 CLOSE_WAIT tcp 0 0 *.5858 .** LISTEN tcp 0 0 localhost.41224 localhost.5858 CLOSE_WAIT

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: I have just gotten the "Too many files" exception in Admin Console, accessing domain1 only (domain2 was still running but I was not accessing it through Admin Console). I was basically going through many different screens of Admin Console and at the end of my testing got this exception:

HTTP Status 500 - type Exception report message descriptionThe server encountered an internal error () that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.RuntimeException while attempting to process a 'beforeEncode' event for 'event138'. root cause java.lang.RuntimeException: java.lang.RuntimeException while attempting to process a 'beforeEncode' event for 'event138'. root cause java.lang.RuntimeException: java.lang.reflect.InvocationTargetException while attempting to process a 'beforeEncode' event for 'event138'. root cause java.lang.reflect.InvocationTargetException root cause com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: Too many open files root cause java.net.SocketException: Too many open files

This time majority of the CLOSE_WAIT sockets are:

tcp 0 0 localhost.40551 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40552 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40553 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40554 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40555 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40556 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40557 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40558 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40559 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40560 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40561 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40562 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40563 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40564 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40565 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40566 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.40567 localhost.appserv- CLOSE_WAIT

I have never seen this problem when Glassfish was running on a different OS. I'll attach netstat.all.domain1-actions file with the full netstat output.

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: The connection for admin port is probably not getting closed probably. Seeking input from web container team.

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: I have tried this with a brand new domain1 (removed previous Glassfish and unzipped the distribution again). After starting domain1 and accessing Admin Console without creating server instances of any sort, I still saw the issue of sockets not released and remaining in CLOSE_WAIT state. There are no exceptions in server.log (I'll attach it as domain1.server.log).

Below are some details of my testing. I would like to note that netstat was often very slow. I had to wait a minute or so for the output, so it was slowing down the testing.

1. After loading of Admin Console:

bash-3.2# netstat -a | wc 112 588 7739

bash-3.2# netstat -a | grep appserv tcp4 0 0 localhost.appserv- localhost.63000 FIN_WAIT_2 tcp 0 0 localhost.63000 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63001 FIN_WAIT_2 tcp 0 0 localhost.63001 localhost.appserv- CLOSE_WAIT tcp 0 0 *.appserv- .** LISTEN

2. click on server (Admin Server)

bash-3.2# netstat -a | wc 131 702 9216

bash-3.2# netstat -a | grep appserv tcp4 0 0 localhost.appserv- localhost.63000 FIN_WAIT_2 tcp 0 0 localhost.63000 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63001 FIN_WAIT_2 tcp 0 0 localhost.63001 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63248 ESTABLISHED tcp 0 0 localhost.63248 localhost.appserv- ESTABLISHED tcp4 0 0 localhost.appserv- localhost.63249 ESTABLISHED tcp 0 0 localhost.63249 localhost.appserv- ESTABLISHED tcp4 0 0 localhost.appserv- localhost.63250 ESTABLISHED tcp 0 0 localhost.63250 localhost.appserv- ESTABLISHED tcp 0 0 *.appserv- .** LISTEN

3. Click on Standalone Instances (wait), click on Clusters (wait), click on New button on Clusters screen:

bash-3.2# netstat -a | wc 146 779 10256

bash-3.2# netstat -a | grep appserv tcp4 0 0 localhost.appserv- localhost.63000 FIN_WAIT_2 tcp 0 0 localhost.63000 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63001 FIN_WAIT_2 tcp 0 0 localhost.63001 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63248 FIN_WAIT_2 tcp 0 0 localhost.63248 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63249 FIN_WAIT_2 tcp 0 0 localhost.63249 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63410 FIN_WAIT_2 tcp 0 0 localhost.63410 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63411 FIN_WAIT_2 tcp 0 0 localhost.63411 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63412 FIN_WAIT_2 tcp 0 0 localhost.63412 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63467 FIN_WAIT_2 tcp 0 0 localhost.63467 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63468 FIN_WAIT_2 tcp 0 0 localhost.63468 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63469 FIN_WAIT_2 tcp 0 0 localhost.63469 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63544 ESTABLISHED tcp 0 0 localhost.63544 localhost.appserv- ESTABLISHED tcp4 0 0 localhost.appserv- localhost.63545 ESTABLISHED tcp 0 0 localhost.63545 localhost.appserv- ESTABLISHED tcp 0 0 *.appserv- .** LISTEN

4. Click on Applications (wait), click on Lifecycle Modules:

bash-3.2# netstat -a | wc 151 809 10643

bash-3.2# netstat -a | grep appserv tcp 0 0 localhost.63000 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63001 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63248 FIN_WAIT_2 tcp 0 0 localhost.63248 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63249 FIN_WAIT_2 tcp 0 0 localhost.63249 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63410 FIN_WAIT_2 tcp 0 0 localhost.63410 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63411 FIN_WAIT_2 tcp 0 0 localhost.63411 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63412 FIN_WAIT_2 tcp 0 0 localhost.63412 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63467 FIN_WAIT_2 tcp 0 0 localhost.63467 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63468 FIN_WAIT_2 tcp 0 0 localhost.63468 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63469 FIN_WAIT_2 tcp 0 0 localhost.63469 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63544 FIN_WAIT_2 tcp 0 0 localhost.63544 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63545 FIN_WAIT_2 tcp 0 0 localhost.63545 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63587 FIN_WAIT_2 tcp 0 0 localhost.63587 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63588 FIN_WAIT_2 tcp 0 0 localhost.63588 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.63671 ESTABLISHED tcp 0 0 localhost.63671 localhost.appserv- ESTABLISHED tcp4 0 0 localhost.appserv- localhost.63672 ESTABLISHED tcp 0 0 localhost.63672 localhost.appserv- ESTABLISHED tcp 0 0 localhost.63673 localhost.appserv- TIME_WAIT tcp 0 0 *.appserv- .** LISTEN

You can see in the above output, that we have two CLOSE_WAIT sockets (at the top of the listing) that are not released. These only grew during further testing. I'll skip more details, that seem irrelevant (I deployed and accessed application, looked at Logger Settings screen, created an Instance Property for Admin Server etc. but still did not create any instances nor resources). At the end I had the following:

bash-3.2# netstat -a | wc 251 1409 18444

bash-3.2# netstat -a | grep appserv |wc 127 762 9901

bash-3.2# netstat -a | grep appserv tcp 0 0 localhost.63000 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63001 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63248 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63249 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63410 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63411 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63412 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63467 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63468 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63469 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63544 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63545 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63587 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63588 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63671 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63672 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64033 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64034 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64276 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64277 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64278 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64279 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64280 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64334 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64339 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64341 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64342 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64343 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64344 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64345 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64346 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64347 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64440 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64441 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64442 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64443 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64444 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64445 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64446 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64447 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64448 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64449 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64450 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64451 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64494 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64495 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64496 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64497 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64498 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64499 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64500 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64501 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64502 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64503 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64504 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64505 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64506 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64507 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64508 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64651 FIN_WAIT_2 tcp 0 0 localhost.64651 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64652 FIN_WAIT_2 tcp 0 0 localhost.64652 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64653 FIN_WAIT_2 tcp 0 0 localhost.64653 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64654 FIN_WAIT_2 tcp 0 0 localhost.64654 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64655 FIN_WAIT_2 tcp 0 0 localhost.64655 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64656 FIN_WAIT_2 tcp 0 0 localhost.64656 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64657 FIN_WAIT_2 tcp 0 0 localhost.64657 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64658 FIN_WAIT_2 tcp 0 0 localhost.64658 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64659 FIN_WAIT_2 tcp 0 0 localhost.64659 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64660 FIN_WAIT_2 tcp 0 0 localhost.64660 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64661 FIN_WAIT_2 tcp 0 0 localhost.64661 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64662 FIN_WAIT_2 tcp 0 0 localhost.64662 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64663 FIN_WAIT_2 tcp 0 0 localhost.64663 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64664 FIN_WAIT_2 tcp 0 0 localhost.64664 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64665 FIN_WAIT_2 tcp 0 0 localhost.64665 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64666 FIN_WAIT_2 tcp 0 0 localhost.64666 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64667 FIN_WAIT_2 tcp 0 0 localhost.64667 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64668 FIN_WAIT_2 tcp 0 0 localhost.64668 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64669 FIN_WAIT_2 tcp 0 0 localhost.64669 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64670 FIN_WAIT_2 tcp 0 0 localhost.64670 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64671 FIN_WAIT_2 tcp 0 0 localhost.64671 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64672 FIN_WAIT_2 tcp 0 0 localhost.64672 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64673 FIN_WAIT_2 tcp 0 0 localhost.64673 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64732 FIN_WAIT_2 tcp 0 0 localhost.64732 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64733 FIN_WAIT_2 tcp 0 0 localhost.64733 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64734 FIN_WAIT_2 tcp 0 0 localhost.64734 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64735 FIN_WAIT_2 tcp 0 0 localhost.64735 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64736 FIN_WAIT_2 tcp 0 0 localhost.64736 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64785 FIN_WAIT_2 tcp 0 0 localhost.64785 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64786 FIN_WAIT_2 tcp 0 0 localhost.64786 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64787 FIN_WAIT_2 tcp 0 0 localhost.64787 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64837 FIN_WAIT_2 tcp 0 0 localhost.64837 localhost.appserv- CLOSE_WAIT tcp4 0 0 localhost.appserv- localhost.64838 FIN_WAIT_2 tcp 0 0 localhost.64838 localhost.appserv- CLOSE_WAIT tcp 0 0 *.appserv- .** LISTEN

There are 59 sockets in CLOSE_WAIT state. I checked again after typing this comment and see the following:

bash-3.2# netstat -a | grep appserv tcp 0 0 localhost.63000 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63001 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63248 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63249 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63410 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63411 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63412 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63467 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63468 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63469 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63544 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63545 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63587 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63588 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63671 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.63672 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64033 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64034 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64276 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64277 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64278 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64279 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64280 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64334 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64339 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64341 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64342 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64343 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64344 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64345 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64346 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64347 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64440 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64441 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64442 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64443 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64444 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64445 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64446 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64447 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64448 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64449 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64450 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64451 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64494 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64495 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64496 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64497 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64498 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64499 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64500 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64501 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64502 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64503 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64504 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64505 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64506 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64507 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64508 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64651 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64652 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64653 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64654 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64655 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64656 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64657 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64658 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64659 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64660 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64661 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64662 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64663 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64664 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64665 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64666 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64667 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64668 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64669 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64670 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64671 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64672 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64673 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64732 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64733 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64734 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64735 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64736 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64785 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64786 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64787 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64837 localhost.appserv- CLOSE_WAIT tcp 0 0 localhost.64838 localhost.appserv- CLOSE_WAIT tcp 0 0 *.appserv- .** LISTEN

That's:

bash-3.2# netstat -a | grep appserv |wc 93 558 7250

93 sockets that are not released.

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: One more observation: just clicking on the same 2 pages in round robin fashion, increases number of used sockets quite a bit. I was clicking on Standalone Instances and Clusters, and within a minute got from 93 to 182 open sockets for appserver:

bash-3.2# netstat -a | grep appserv |wc 189 1134 14738

After about 10 minutes and the output changed to:

bash-3.2# netstat -a | grep appserv |wc 141 846 10994

which is bad news, because we started with 93. This means that just going through the same pages over and over will also cause the "Too many files" exception, given enough time, since sockets are allocated every time a page is visited (even for consecutive visits) and not properly released. I have just checked that all 140 sockets are now in CLOSE_WAIT state:

bash-3.2# netstat -a | grep appserv | grep CLOSE_WAIT | wc 140 840 10920

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: I also tested clusterjsp.ear application. Accessed it many times, restarted session, etc. While socket use went up, after a while it was going down to previous level again.

Also, restarting browser did not cause the connections to disappear. In the end I have the following number of appserv connections:

bash-3.2# netstat -a | grep appserv | wc 189 1134 14738

5 of which are in FIN_WAIT_2 state.

glassfishrobot commented 13 years ago

@glassfishrobot Commented kumara said: Best guess - this is related to use of REST API to render admin console pages. How does admin console connect to REST API? Do we have a pool of HttpURLConnection or do we create a new HttpURLConnection per operation?

If Jersey client API is being used, HttpURLConnection will map to com.sun.jersey.api.client.Client. It will be advisable to use a pool of Client objects to keep the number of client connections in check (and will also perform better because same HTTP connection will be reused)

If it is possible to avoid network traffic altogether by using some internal APIs from Jersey, that will be even better, no pool to maintain and no internal network connection for operations in admin console.

glassfishrobot commented 13 years ago

@glassfishrobot Commented @shingwaichan said: I have deployed a simple war file and accessing the web page. I do not see the CLOSE_WAIT issue there.

Per discussion with Anissa, clicking the "Cluster" page invokes the following rest api: http://:50370/management/domain/clusters/cluster

and clicking "Standalone instance" page invokes the following rest api: http://:50370/management/domain/list-instances http://:50370/management/domain/servers

I have invoked the above rest API individually. I do not see the CLOSE_WAIT issue above.

But if I access the "Cluster" page and then "Standalone instance" page, after a while, I see the following result from "netstat -a | grep 50370" where 50370 is the admin port:

tcp 0 0 *.50370 .** LISTEN tcp4 0 0 localhost.50370 localhost.62629 FIN_WAIT_2 tcp 0 0 localhost.62629 localhost.50370 CLOSE_WAIT tcp4 0 0 localhost.50370 localhost.62902 FIN_WAIT_2 tcp 0 0 localhost.62902 localhost.50370 CLOSE_WAIT tcp4 0 0 localhost.50370 localhost.62903 FIN_WAIT_2 tcp 0 0 localhost.62903 localhost.50370 CLOSE_WAIT tcp4 0 0 localhost.50370 localhost.62904 FIN_WAIT_2 tcp 0 0 localhost.62904 localhost.50370 CLOSE_WAIT tcp4 0 0 localhost.50370 localhost.62905 FIN_WAIT_2 tcp 0 0 localhost.62905 localhost.50370 CLOSE_WAIT tcp4 0 0 localhost.50370 localhost.62906 FIN_WAIT_2 tcp 0 0 localhost.62906 localhost.50370 CLOSE_WAIT tcp4 0 0 localhost.50370 localhost.62907 FIN_WAIT_2 tcp 0 0 localhost.62907 localhost.50370 CLOSE_WAIT

If I let the server running over night, then I see the following: tcp 0 0 *.50370 .** LISTEN tcp 0 0 localhost.62629 localhost.50370 CLOSE_WAIT tcp 0 0 localhost.62902 localhost.50370 CLOSE_WAIT tcp 0 0 localhost.62903 localhost.50370 CLOSE_WAIT tcp 0 0 localhost.62904 localhost.50370 CLOSE_WAIT tcp 0 0 localhost.62905 localhost.50370 CLOSE_WAIT tcp 0 0 localhost.62906 localhost.50370 CLOSE_WAIT tcp 0 0 localhost.62907 localhost.50370 CLOSE_WAIT

So, it seems that there is an issue that is specific to admin console access.

glassfishrobot commented 13 years ago

@glassfishrobot Commented @shingwaichan said: Note that the above direct accesses of rest API only indicate that the server side is ok. One may still need to check whether the client API has no issue.

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: A modification in the way console interacts with jersey seems to have fixed the issue, but it needs to be tested.

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: Filed issue JERSEY-725, the fix in console did not work.

glassfishrobot commented 13 years ago

@glassfishrobot Commented scatari said: Pre-approving for 3.1.1 as this is a test blocker.

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: Tried a basic test that opens a HttpURLConnection many times on AIX - couldn't reproduce the problem. Tried a simple jsf application and opened HttpURLConnection while accessing that application many times, but could not reproduce the problem with the test cases.

glassfishrobot commented 13 years ago

@glassfishrobot Commented dhirup said: What is the value of TCP_TIMEWAIT on the AIX system ? See link below:

http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/tprf_tuneaix.html

Another thing to check how differently is the Admin Adaptor configured as compared to the REST Adaptor. As I understand both of them use Grizzly Adaptor and since one shows a different behavior than the other its possible that their configuration/initialization may have something to do with this

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: Transfering to Alexey since he is looking into the issue

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: After specifying linger="0", in domain.xml the problem cannot be reproduced :

The setting could be made specific to the admin-listener by modifying the transport setting for that particular listener.

glassfishrobot commented 13 years ago

@glassfishrobot Commented scatari said: I have added a new template file for AIX(domain-aix.xml) under packager/nucleus-base/lib/templates. Please update this file to add any AIX specific VM arguments or domain config elements. This file will be automatically picked up during AIX packaging/bundle generation process and will be ignored during non-AIX builds.

glassfishrobot commented 13 years ago

@glassfishrobot Commented lidiam said: The following workaround worked on build b09:

changed:

to: </network -listener>

and added:

After restarting the domain, accessed Admin Console and verified via "netstat -a | grep appserv" that all sockets were released.

glassfishrobot commented 13 years ago

@glassfishrobot Commented oleksiys said: looks like the issue is in admingui, which doesn't close RestResponse properly. Attaching the diff., which has to fix admingui.

glassfishrobot commented 13 years ago

@glassfishrobot Commented oleksiys said: fix for the admingui module

glassfishrobot commented 13 years ago

@glassfishrobot Commented oleksiys said: BTW, this issue in not just AIX related.

glassfishrobot commented 13 years ago

@glassfishrobot Commented sirajg said: Checked in the patch provided by Oleksiy.