Closed glassfishrobot closed 12 years ago
@glassfishrobot Commented gfbugbridge said: *READ-ONLY Data from Bugtraq***** Inside SWAN :http://swsblweb1.central.sun.com:8080/CrPrint?id=6597311 Outside SWAN :http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6597311 ****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Description After a clean shutdownstartup of a remote instance failed Build : as91EE rc3 build Attached instance logs and broker log
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation The appserver log tries to connect to broker at 05:27:21 [#|2007-08-24T05:27:21.656-0700|WARNING|sun-appserver9.1|javax.jms|_ThreadID=10;_ThreadName=main;_RequestID=09f1c981-1253-451d-b5d1-624f93bb6168;|[C4003]: Error occurred on connection creation [localhost:37677]. - cause: java.net.ConnectException: Connection refused|#]
The broker starts up at 05:27:29 [24/Aug/2007:05:27:29 PDT] WARNING [B1262]: Restricted JMS service until communication with master broker is established.
After this the ra does not throw java.net.ConnectException: Connection refused but throws an exception while reading from the imqChannel
[#|2007-08-24T05:27:29.249-0700|WARNING|sun-appserver9.1|javax.jms|_ThreadID=15;_ThreadName=iMQReadChannel-7;_RequestID=b6d86803-9165-43d1-b27f-97be72085a8c;|[I500]: Caught JVM Exception: java.io.EOFException|#]
But the MQ logs does not indicate that an Admin connection has been accepted. Usually when an admin connection is established with the broker there is a log message like this, "[B1065]: Accepting: admin@127.0.0.1:58292->admin:58271. Count: service=1 broker=1" But this is never printed in the MQ log file, I think, though the broker log file shows that the broker portmapper has started, it could still be in a position where it is unable to accept connections.
The best way to ascertain this would be to connect to the broker during this phase through a jms client and check the response, but QE does not have this setup and the issue is not reproducible consistently, and there is no obvious info in the log which indicates a problem., it is quite difficult to debug the problem right now with no setup and with the existing log file alone.
Creating an MQ CR 6597988 for this, Requesting MQ team to evaluate this, and give their inputs on the state of the broker at the time when the ra tries to connect to it.
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation From the log files - it appears that the broker may not be completely up at the point the error occured. The "Restricted JMS" service happens at the same second as the client side exception occured. My guess is that the admin connection failed because the jms service was not yet active (e.g. the connection occured a couple of milliseconds before the service went active) From looking @ the other entries in the broker log -
This means that the admin connection is created immediately after the broker is started and anything that causes the broker to take slightly longer to start will cause a failure.
If possible, the admin connection should either delay a second or so or retry once in the case of a failure.
We can not verify that this is the cause without some sort of reproducable test case.
1. This is about AS instance starting up a 'LOCAL' broker (not remote) 2. 'Restricted JMS service' does not affect admin connection
Since the problem occurred during AS starting up a local broker, unlikely the 'EOFException's were resulted from AS/RA operated any restricted operations on a NORMAL connection (#3).
In an isolated test using MQ/RA LocalBrokerRunner.start()/stop() which repeated start/stop a 9-broker cluster didn't reproduce the EOFException problem in more than 50 iterations.
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation To help debugging the issue, please configure appserver and broker with the following configurations and re-run the test. Thanks. 1. Configure app server to include the following JVM properties in app server's domain.xml file:
The above options should be added to the app server that failed to re-start MQ RA.
2. Please add the following imqbrokerd args options (not JVN options) when starting up MQ broker:
-debug pkt
The args options should be added in the
For example, in imqbrokerd script, append -debug pkt at the following statement:
3. Please increase the jms-service start timeout to 5 minutes (default 1 minute):
See http://docs.sun.com/source/819-0079/dgjms.html#wp19009
use asadmin set/get to set the start-timeout
e.g.
./asadmin get server.jms-service.init-timeout-in-seconds
will show its current value ( = 60 )
./asadmin set server.jms-service.init-timeout-in-seconds=300
to set it to 5 minutes
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation Vivek reported the problem occured today. The broker instance that couldn't connectto is running on iasqalx4. Once the EOFException occurred when RA making the ADMIN connection to the broker, the problem will persist until broker shutdown or restarts.
The EOFException occurred while client runtime read fixed header of hello-reply packet. The EOFException only occurs when making ADMIN connection to the broker on iasqalx4 where the broker is running and only occurs when using 'localhost' to connect to the broker, imqBrokerHostName=localhost
imqcmd can connect to the broker instance if specify the real hostname (iasqalx4) or imqcmd run on a different machine (using same version of Java 1.5.0_12 on a similar SunOS 5.10 x86 machine).
see detailed information in Comments.
This looks like a problem below 'application' layer - e.g. TCP stack or OS. I'v suggested to check the patches on the machine first. Please try the workaround by using the real hostname instead of 'localhost' when connecting to broker.
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation The simple test program (SimpleAdmin.java attached) that start/stop 1 broker instancerepeatly (25sec sleep in between) ran on iasqalx4 2 times both reproduced the problem (in ~2 days and in ~1 day respectively); The same program ran on a similar system mqlx4 for ~5 days without problem (use same Java version jdk1.5.0_12).
Broker debug shows that for the EOFException, ServerSocket.accept didn't even has a return. The problem lines under broker layer on iasqalx4 specifically. Please check iasqalx4 patches and also comparing with mqlx4 (in same domain as iasqalx4).
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Justification Priority changed from [] to [3-Medium]instance startup XXXXXX 2007-08-26 00:34:43 GMT
Priority changed from [3-Medium] to [4-Low] Discussed in the 4:00 pm HA meeting and it was agreed that this can be lowered to a P4 since it is very hard to reproduce. Will be investigated for the update XXXXXX 2007-08-30 19:04:43 GMT
****READ-ONLY Data from Bugtraq Ends**
@glassfishrobot Commented @sivakumart said: Assigning to Ramesh as the source issue is assigned to him
@glassfishrobot Commented rampsarathy said: This is an MQ bug and no work required from glassfish, Assigning to Satish to follow-up and close
@glassfishrobot Commented tmueller said: Bulk update to change fix version to "not determined" for all issues still open but with a fix version for a released version.
@glassfishrobot Commented liang.x.zhao said: I can not reproduce it against GlassFish 3.1.2.
@glassfishrobot Commented Was assigned to liang.x.zhao
@glassfishrobot Commented This issue was imported from java.net JIRA GLASSFISH-3662
@glassfishrobot Commented Reported by gfbugbridge
@glassfishrobot Commented Marked as fixed on Wednesday, March 7th 2012, 11:07:46 pm
*READ-ONLY Data from Bugtraq***** Inside SWAN :http://swsblweb1.central.sun.com:8080/CrPrint?id=6597311 Outside SWAN :http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6597311 ****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Description After a clean shutdownstartup of a remote instance failed Build : as91EE rc3 build Attached instance logs and broker log
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation The appserver log tries to connect to broker at 05:27:21 [#|2007-08-24T05:27:21.656-0700|WARNING|sun-appserver9.1|javax.jms|_ThreadID=10;_ThreadName=main;_RequestID=09f1c981-1253-451d-b5d1-624f93bb6168;|[C4003]: Error occurred on connection creation [localhost:37677]. - cause: java.net.ConnectException: Connection refused|#]
The broker starts up at 05:27:29 [24/Aug/2007:05:27:29 PDT] WARNING [B1262]: Restricted JMS service until communication with master broker is established.
After this the ra does not throw java.net.ConnectException: Connection refused but throws an exception while reading from the imqChannel
[#|2007-08-24T05:27:29.249-0700|WARNING|sun-appserver9.1|javax.jms|_ThreadID=15;_ThreadName=iMQReadChannel-7;_RequestID=b6d86803-9165-43d1-b27f-97be72085a8c;|[I500]: Caught JVM Exception: java.io.EOFException|#]
But the MQ logs does not indicate that an Admin connection has been accepted. Usually when an admin connection is established with the broker there is a log message like this, "[B1065]: Accepting: admin@127.0.0.1:58292->admin:58271. Count: service=1 broker=1" But this is never printed in the MQ log file, I think, though the broker log file shows that the broker portmapper has started, it could still be in a position where it is unable to accept connections.
The best way to ascertain this would be to connect to the broker during this phase through a jms client and check the response, but QE does not have this setup and the issue is not reproducible consistently, and there is no obvious info in the log which indicates a problem., it is quite difficult to debug the problem right now with no setup and with the existing log file alone.
Creating an MQ CR 6597988 for this, Requesting MQ team to evaluate this, and give their inputs on the state of the broker at the time when the ra tries to connect to it.
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation From the log files - it appears that the broker may not be completely up at the point the error occured. The "Restricted JMS" service happens at the same second as the client side exception occured. My guess is that the admin connection failed because the jms service was not yet active (e.g. the connection occured a couple of milliseconds before the service went active) From looking @ the other entries in the broker log -
This means that the admin connection is created immediately after the broker is started and anything that causes the broker to take slightly longer to start will cause a failure.
If possible, the admin connection should either delay a second or so or retry once in the case of a failure.
We can not verify that this is the cause without some sort of reproducable test case.
1. This is about AS instance starting up a 'LOCAL' broker (not remote) 2. 'Restricted JMS service' does not affect admin connection
Since the problem occurred during AS starting up a local broker, unlikely the 'EOFException's were resulted from AS/RA operated any restricted operations on a NORMAL connection (#3).
In an isolated test using MQ/RA LocalBrokerRunner.start()/stop() which repeated start/stop a 9-broker cluster didn't reproduce the EOFException problem in more than 50 iterations.
Run a simple test program (on iasqalx4) that start/stop 1 broker instance repeatly (20 sec sleep in between) for ~2 days and reproduced the problem. The test program (SimpleAdmin.java, attached) uses same code to start/stop broker as MQ/RA LocalBrokerRunner.start()/stop().
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation To help debugging the issue, please configure appserver and broker with the following configurations and re-run the test. Thanks. 1. Configure app server to include the following JVM properties in app server's domain.xml file:
The above options should be added to the app server that failed to re-start MQ RA.
2. Please add the following imqbrokerd args options (not JVN options) when starting up MQ broker:
-debug pkt
The args options should be added in the/imq/bin/imqbrokerd script.
For example, in imqbrokerd script, append -debug pkt at the following statement:
3. Please increase the jms-service start timeout to 5 minutes (default 1 minute):
See http://docs.sun.com/source/819-0079/dgjms.html#wp19009
use asadmin set/get to set the start-timeout
e.g.
./asadmin get server.jms-service.init-timeout-in-seconds
will show its current value ( = 60 )
./asadmin set server.jms-service.init-timeout-in-seconds=300
to set it to 5 minutes
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation Vivek reported the problem occured today. The broker instance that couldn't connectto is running on iasqalx4. Once the EOFException occurred when RA making the ADMIN connection to the broker, the problem will persist until broker shutdown or restarts.
The EOFException occurred while client runtime read fixed header of hello-reply packet. The EOFException only occurs when making ADMIN connection to the broker on iasqalx4 where the broker is running and only occurs when using 'localhost' to connect to the broker, imqBrokerHostName=localhost
imqcmd can connect to the broker instance if specify the real hostname (iasqalx4) or imqcmd run on a different machine (using same version of Java 1.5.0_12 on a similar SunOS 5.10 x86 machine).
see detailed information in Comments.
This looks like a problem below 'application' layer - e.g. TCP stack or OS. I'v suggested to check the patches on the machine first. Please try the workaround by using the real hostname instead of 'localhost' when connecting to broker.
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Evaluation The simple test program (SimpleAdmin.java attached) that start/stop 1 broker instancerepeatly (25sec sleep in between) ran on iasqalx4 2 times both reproduced the problem (in ~2 days and in ~1 day respectively); The same program ran on a similar system mqlx4 for ~5 days without problem (use same Java version jdk1.5.0_12).
Broker debug shows that for the EOFException, ServerSocket.accept didn't even has a return. The problem lines under broker layer on iasqalx4 specifically. Please check iasqalx4 patches and also comparing with mqlx4 (in same domain as iasqalx4).
****READ-ONLY Data from Bugtraq Ends** *READ-ONLY Data from Bugtraq***** Justification Priority changed from [] to [3-Medium]instance startup XXXXXX 2007-08-26 00:34:43 GMT
Priority changed from [3-Medium] to [4-Low] Discussed in the 4:00 pm HA meeting and it was agreed that this can be lowered to a P4 since it is very hard to reproduce. Will be investigated for the update XXXXXX 2007-08-30 19:04:43 GMT
****READ-ONLY Data from Bugtraq Ends**
Environment
Operating System: All Platform: All
Affected Versions
[9.1peur1]