eclipse-ee4j / glassfish

Eclipse GlassFish
https://eclipse-ee4j.github.io/glassfish/
378 stars 144 forks source link

TX-003 tx/recoverylockfile (Permission denied) from a remote instance #13615

Closed glassfishrobot closed 13 years ago

glassfishrobot commented 13 years ago

TX-003 tx/recoverylockfile (Permission denied) from a remote instance

glassfish-3.1-b21.zip

With a cluster of three instances on two machines, kill one instance, e.g. clustered_instance_2 on one machine, asqe-sb-8, then recover the failure from other instances on a remote machine, asqe-sb-7. The recovery failed on clustered_instance_3 due to file permission issue.

I give all permission to log dir that I created for recovery. I do see tx log files created by glassfish, which has a permission issue. hudson@asqe-sb-8% ls -ld /net/asqe-sb-8/sbpool/hudson/test1/txlog drwxrwxrwx 2 hudson other 2 Sep 24 11:00 /net/asqe-sb-8/sbpool/hudson/test1/txlog/ hudson@asqe-sb-8% pwd /sbpool/hudson/test1/txlog hudson@asqe-sb-8% ls clustered_instance_1/ clustered_instance_2/ clustered_instance_3/ hudson@asqe-sb-8% ls -l clustered_instance_2/tx/recoverylockfile rw-rr- 1 hudson other 0 Sep 24 12:21 clustered_instance_2/tx/recoverylockfile hudson@asqe-sb-8%

The server.log of clustered_instance_3 on a remote machine. [#|2010-09-24T12:17:48.070-0700|WARNING|glassfish3.1|javax.enterprise.system.core.transaction.com.sun.enterprise.transaction.jts.recovery|_ThreadID=15;_ThreadName=Thread-1;|jts.exception_in_recovery_file_handling java.io.FileNotFoundException: /net/asqe-sb-8/sbpool/hudson/test1/txlog/clustered_instance_2/tx/recoverylockfile (Permission denied) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at com.sun.enterprise.transaction.jts.recovery.RecoveryLockFile.isRecovering(RecoveryLockFile.java:226) at com.sun.enterprise.transaction.jts.recovery.RecoveryLockFile.getInstanceRecoveredFor(RecoveryLockFile.java:189) at com.sun.enterprise.transaction.jts.recovery.GMSCallBack.finishDelegatedRecovery(GMSCallBack.java:177) at com.sun.enterprise.transaction.jts.recovery.GMSCallBack.processNotification(GMSCallBack.java:154) at com.sun.enterprise.ee.cms.impl.client.FailureRecoveryActionImpl.notifyListeners(FailureRecoveryActionImpl.java:95) at com.sun.enterprise.ee.cms.impl.client.FailureRecoveryActionImpl.consumeSignal(FailureRecoveryActionImpl.java:77) at com.sun.enterprise.ee.cms.impl.common.Router$CallableAction.call(Router.java:615) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)

| #] |

The test details are given at http://agni-1.sfbay.sun.com/asqe-logs/export1/v3.1/docs/sqe/txn/cliweb5.html

Environment

Operating System: All Platform: All

Affected Versions

[3.1]

glassfishrobot commented 6 years ago
glassfishrobot commented 13 years ago

@glassfishrobot Commented sherryshen said: server.log from 3 instances are saved at http://agni-1.sfbay.sun.com/asqe-logs/export1/v3.1/docs/sqe/txn/13615/

glassfishrobot commented 13 years ago

@glassfishrobot Commented sherryshen said: Also see permission issue when I clean up the log for testing new build. hudson@asqe-sb-8% pwd /sbpool/hudson/test1/txlog hudson@asqe-sb-8% ls clustered_instance_1/ clustered_instance_2/ clustered_instance_3/ hudson@asqe-sb-8% rm -rf * rm: Unable to remove directory clustered_instance_1/tx: Permission denied rm: Unable to remove directory clustered_instance_1: File exists rm: Unable to remove directory clustered_instance_3/tx: Permission denied rm: Unable to remove directory clustered_instance_3: File exists hudson@asqe-sb-8%

glassfishrobot commented 13 years ago

@glassfishrobot Commented mvatkina said: Rev 41114 should fix the original problem.

glassfishrobot commented 13 years ago

@glassfishrobot Commented sherryshen said: Tested on glassfish-3.1-b22-09_27_2010.zip Slightly adjusted txlog dir. Verified the fix for recoverylockfile. Recovery failed with another error.

1) Configure txlog dir to a shared location, which can be accessed by all instances, e.g. hudson@asqe-sb-8% asadmin create-system-properties --target sqe-cluster TX-LOG-DIR=/net/asqe-logs/export1/txlog Command create-system-properties executed successfully. hudson@asqe-sb-8% hudson@asqe-sb-8% ls -ld /net/asqe-logs/export1/txlog drwxrwxrwx 5 hudson other 5 Sep 27 15:32 /net/asqe-logs/export1/txlog/ hudson@asqe-sb-8%

hudson@asqe-sb-7% ls -ld /net/asqe-logs/export1/txlog drwxrwxrwx 5 60004 other 5 Sep 27 15:32 /net/asqe-logs/export1/txlog/ hudson@asqe-sb-7%

Do a restart, and then see 3 instances that txlog dir, and have less permission than txlog dir. hudson@asqe-sb-8% ls -l /net/asqe-logs/export1/txlog total 9 drwxr-xr-x 3 236910 other 3 Sep 27 15:32 clustered_instance_1/ drwxr-xr-x 3 hudson other 3 Sep 27 15:32 clustered_instance_2/ drwxr-xr-x 3 236910 other 3 Sep 27 15:32 clustered_instance_3/ hudson@asqe-sb-8%

hudson@asqe-sb-7% ls -l /net/asqe-logs/export1/txlog total 9 drwxr-xr-x 3 hudson other 3 Sep 27 15:32 clustered_instance_1/ drwxr-xr-x 3 60004 other 3 Sep 27 15:32 clustered_instance_2/ drwxr-xr-x 3 hudson other 3 Sep 27 15:32 clustered_instance_3/ hudson@asqe-sb-7%

2) run cliweb5 tests The previous permission error on recoverylockfile is gone. The recovery failed with another error. Please see details in server.log. http://agni-1.sfbay.sun.com/asqe-logs/export1/v3.1/docs/sqe/txn/13615/server.log.in3.b22_0927

[#|2010-09-27T17:10:01.114-0700|INFO|glassfish3.1| javax.enterprise.system.core.transaction.com.sun.enterprise.transaction.jts.recovery| _ThreadID=15;_ThreadName=Thread-1;| Checking Lock File /net/asqe-logs/export1/txlog/clustered_instance_2/tx/recoverylockfile|#]

[#|2010-09-27T17:10:01.162-0700|INFO|glassfish3.1| javax.enterprise.system.core.transaction.com.sun.enterprise.transaction.jts.recovery| _ThreadID=15;_ThreadName=Thread-1;| Updating File /net/asqe-logs/export1/txlog/clustered_instance_3/tx/recoverylockfile|#]

[#|2010-09-27T17:10:01.182-0700|INFO|glassfish3.1| javax.enterprise.system.core.transaction.com.sun.enterprise.transaction.jts.recovery| _ThreadID=15;_ThreadName=Thread-1;| Writing into file /net/asqe-logs/export1/txlog/clustered_instance_2/tx/recoverylockfile|#]

...... [#|2010-09-27T17:10:07.952-0700|SEVERE|glassfish3.1| javax.enterprise.system.core.transaction.com.sun.jts.CosTransactions|_ThreadID=15;_ThreadName=Thread-1;| JTS5022: Unexpected exception [com.sun.jts.CosTransactions.LogException: Log exception at point 3: LOG-002: Open failure] from log.|#]

[#|2010-09-27T17:10:07.959-0700|WARNING|glassfish3.1| javax.enterprise.resource.jta.com.sun.enterprise.transaction|_ThreadID=15;_ThreadName=Thread-1;| DTX5016:Error in XA recovery. See logs for more details org.omg.CORBA.INTERNAL: JTS5022: Unexpected exception [com.sun.jts.CosTransactions.LogException: Log exception at point 3: LOG-002: Open failure] from log. vmcid: 0x0 minor code: 0 completed: No at com.sun.jts.CosTransactions.Log.open(Log.java:236) at com.sun.jts.CosTransactions.CoordinatorLog.openLog(CoordinatorLog.java:1224) at com.sun.jts.CosTransactions.CoordinatorLog.getLogged(CoordinatorLog.java:1327) at com.sun.jts.CosTransactions.DelegatedRecoveryManager.delegated_recover(DelegatedRecoveryManager.java:224) at com.sun.jts.CosTransactions.DelegatedRecoveryManager.delegated_recover(DelegatedRecoveryManager.java:208) at com.sun.enterprise.transaction.jts.ResourceRecoveryManagerImpl.recoverIncompleteTx(ResourceRecoveryManagerImpl.java:158) at com.sun.enterprise.transaction.jts.recovery.GMSCallBack.doRecovery(GMSCallBack.java:227) at com.sun.enterprise.transaction.jts.recovery.GMSCallBack.processNotification(GMSCallBack.java:154) at com.sun.enterprise.ee.cms.impl.client.FailureRecoveryActionImpl.notifyListeners(FailureRecoveryActionImpl.java:95) at com.sun.enterprise.ee.cms.impl.client.FailureRecoveryActionImpl.consumeSignal(FailureRecoveryActionImpl.java:77) at com.sun.enterprise.ee.cms.impl.common.Router$CallableAction.call(Router.java:615) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)

| #] |

glassfishrobot commented 13 years ago

@glassfishrobot Commented mvatkina said: The new stack trace is the same as in 13527. Marking this bug as fixed.

glassfishrobot commented 13 years ago

@glassfishrobot Commented mvatkina said: This change need to be reverted. All permissions are to be set by the admin

glassfishrobot commented 13 years ago

@glassfishrobot Commented mvatkina said: Reverted the change. Not it's the same issue as 13527

glassfishrobot commented 13 years ago

@glassfishrobot Commented Was assigned to mvatkina

glassfishrobot commented 7 years ago

@glassfishrobot Commented This issue was imported from java.net JIRA GLASSFISH-13615

glassfishrobot commented 13 years ago

@glassfishrobot Commented Reported by sherryshen

glassfishrobot commented 13 years ago

@glassfishrobot Commented Marked as duplicate on Monday, October 4th 2010, 3:17:58 am