DICE-UNC / jargon

Jargon core libraries
Other
28 stars 31 forks source link

-345000 errors and iRODS Agent state #199

Closed dkocher closed 7 years ago

dkocher commented 7 years ago

I have a reproducible test that fails with a -23000 error code from iRODS when we are concurrently writing to iRODS using org.irods.jargon.core.pub.io.IRODSFileOutputStream. The different threads do not share the client instance.

org.irods.jargon.core.exception.JargonException: error code received from iRODS:-23000
    at org.irods.jargon.core.connection.IRODSErrorScanner.checkSpecificCodesAndThrowIfExceptionLocated(IRODSErrorScanner.java:325)
    at org.irods.jargon.core.connection.IRODSErrorScanner.inspectAndThrowIfNeeded(IRODSErrorScanner.java:123)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.processMessageInfoLessThanZero(AbstractIRODSMidLevelProtocol.java:1172)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:663)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:629)
    at org.irods.jargon.core.connection.IRODSMidLevelProtocol.irodsFunction(IRODSMidLevelProtocol.java:235)
    at org.irods.jargon.core.pub.io.FileIOOperationsAOImpl.write(FileIOOperationsAOImpl.java:93)
    at org.irods.jargon.core.pub.io.IRODSFileOutputStream.write(IRODSFileOutputStream.java:201)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)
    at ch.iterate.mountainduck.io.BufferOutputStream.write(BufferOutputStream.java:56)
    at ch.iterate.mountainduck.io.BufferOutputStream.write(BufferOutputStream.java:45)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:72)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:72)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:72)
dkocher commented 7 years ago

I could further narrow this error down. It only happens when a second file is closed with IRODSFileSystemAOImpl#fileClose in between a transfer.

dkocher commented 7 years ago

You can find a failing test case in https://g.iterate.ch/projects/ITERATE/repos/cyberduck/compare/commits?sourceBranch=refs%2Fheads%2Fbugfix%2FMD-981-irods-concurrency&targetBranch=refs%2Fheads%2Fmaster

dkocher commented 7 years ago

If getIRODSSession().closeSession is called when closing the output stream, the failure writing to the next file is -345000 instead.

dkocher commented 7 years ago

The issue here seems that opening/writing/closing files cannot be interweaved even when on different isolated sessions.

michael-conway commented 7 years ago

Will download the test code and look for the thread handling. The iRODS protocol is stateful, so one cannot interleave simultaneous operations on one thread, that's the first thing I thought looking at it. Will try to reproduce here if on separate threads, but two threads will talk to two distinct agents, and I want to make sure first that's been captured. Cheers MC

dkocher commented 7 years ago

Thanks for looking into this. This will allow us to push iRODS support in Mountain Duck.

michael-conway commented 7 years ago

Yes thanks for patience, I'm actually now able to do some actual work on libraries (and that makes me happy)

michael-conway commented 7 years ago

see https://g.iterate.ch/projects/ITERATE/repos/cyberduck/browse for source code

michael-conway commented 7 years ago

this adapted test works (checked out cyberduck 5.0.3 and made some minor changes to the write feature test


@Test
    public void testWriteConcurent() throws Exception {
        final Profile profile = ProfileReaderFactory.get().read(
                new Local("/home/mconway/testprofile"));
        final Host host = new Host(profile, profile.getDefaultHostname(),
                new Credentials("test1", "test"));

        final IRODSSession session1 = new IRODSSession(host);
        session1.open(new DisabledHostKeyCallback());

        session1.login(new DisabledPasswordStore(),
                new DisabledLoginCallback(), new DisabledCancelCallback(),
                PathCache.empty());

        final IRODSSession session2 = new IRODSSession(host);
        session2.open(new DisabledHostKeyCallback());

        session2.login(new DisabledPasswordStore(),
                new DisabledLoginCallback(), new DisabledCancelCallback(),
                PathCache.empty());

        final Path test1 = new Path(
                new IRODSHomeFinderService(session1).find(), UUID.randomUUID()
                        .toString(), EnumSet.of(Path.Type.file));
        final Path test2 = new Path(
                new IRODSHomeFinderService(session2).find(), UUID.randomUUID()
                        .toString(), EnumSet.of(Path.Type.file));

        final byte[] content = RandomUtils.nextBytes(68400);

        final OutputStream out1 = new IRODSWriteFeature(session1).write(test1,
                new TransferStatus().append(false).length(content.length));
        final OutputStream out2 = new IRODSWriteFeature(session1).write(test2,
                new TransferStatus().append(false).length(content.length));
        new StreamCopier(new TransferStatus(), new TransferStatus()).transfer(
                new ByteArrayInputStream(content), out2);
        out2.close();
        new StreamCopier(new TransferStatus(), new TransferStatus()).transfer(
                new ByteArrayInputStream(content), out1);
        out1.close();

        session1.close();
        session2.close();
    }
michael-conway commented 7 years ago

NEW_ERROR(SYS_BAD_FILE_DESCRIPTOR, -23000)...looking into other causes

dkocher commented 7 years ago

The test case is to show that we see failures when we have concurrent access typically from a file system access multiple files at the same time (in Mountain Duck)

The IRODSFileSystemAO is not shared between threads but reused in a single session instance (the sessions are pooled).

I have merged the branch with the test cases with our master branch and added a test that shows a -345000 error. The previous tests still shows -23000 here on the second write attempt.

See https://g.iterate.ch/projects/ITERATE/repos/cyberduck/compare/commits?sourceBranch=refs%2Fheads%2Fbugfix%2FMD-981-irods-concurrency&targetBranch=refs%2Fheads%2Fmaster

michael-conway commented 7 years ago

Thanks! I'll update the testing on this side!

On Wed, Dec 14, 2016 at 2:58 PM, David Kocher notifications@github.com wrote:

The test case is to show that we see failures when we have concurrent access typically from a file system access multiple files at the same time (in Mountain Duck)

  • Two isolated connections to the same iRODS
  • Opening a file write stream from both connections.
  • Writing to the second stream with cause failures.

The IRODSFileSystemAO is not shared between threads but reused in a single session instance (the sessions are pooled).

I have merged the branch with the test cases with our master branch and added a test that shows a -345000 error. The previous tests still shows -23000 here on the second write attempt.

See https://g.iterate.ch/projects/ITERATE/repos/cyberduck/ compare/commits?sourceBranch=refs%2Fheads%2Fbugfix%2FMD- 981-irods-concurrency&targetBranch=refs%2Fheads%2Fmaster

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DICE-UNC/jargon/issues/199#issuecomment-267139247, or mute the thread https://github.com/notifications/unsubscribe-auth/ABC-LQ8Rk7gB400HYfjwzlDEDwWnt0Jgks5rIEpMgaJpZM4KgGCh .

michael-conway commented 7 years ago

OK I was able to see the -345000 (still cannot get a -23000) and here are a couple observations. The -345000 is actually appropriate here for the test case, though it's pretty subtle and has to do with connection handling.


final IRODSSession session1 = new IRODSSession(host);
        session1.open(new DisabledHostKeyCallback());
        session1.login(new DisabledPasswordStore(),
                new DisabledLoginCallback(), new DisabledCancelCallback(),
                PathCache.empty());

        final IRODSSession session2 = new IRODSSession(host);
        session2.open(new DisabledHostKeyCallback());
        session2.login(new DisabledPasswordStore(),
                new DisabledLoginCallback(), new DisabledCancelCallback(),
                PathCache.empty());

What happens here is that the session1.login calls authenticate, it establishes a connection to agent foo on the iRODS Side

Then the call to session2.login calls authenticate, it closes agent foo on the iRODS side, and connects to agent bar

What you did is perfectly sensible, but since it's the same thread it's the same connection, so a mismatch on the semantics of authenticate. I've added in the snapshot for Jargon a new method to help, in IRODSAccessObjectFactory, if it helps reduce the friction

/**
     * Cause an <code>IRODSAccount</code> to be authenticated, and return and
     * <code>AuthResponse</code> augmented with information about the principal.
     * <p/>
     * Note that the account information is actually cached in a thread local by
     * the <code>IRODSSession</code>, so this method will return the cached
     * response if already authenticated. If not cached, this method causes an
     * authentication process.
     *
     * @param irodsAccount
     *            {@IRODSAccount} with the authenticating
     *            principal
     * @return {@link AuthResponse} containing information about the
     *         authenticated principal. Note that the authentication process may
     *         cause the authenticating <code>IRODSAccount</code> to be altered
     *         or augmented. The resulting account that can be cached and
     *         re-used by applications will be in the authenticated account.
     * @throws AuthenticationException
     *             If the principal cannot be authenticated. This will be thrown
     *             on initial authentication
     * @throws JargonException
     */
    AuthResponse authenticateIRODSAccountUtilizingCachedConnectionIfPresent(
            IRODSAccount irodsAccount) throws AuthenticationException,
            JargonException;

If you do things in this manner, than the login for session2 will leave session1 alone. There's actually no need to log in for the second session, since it's the same thread, and the connection is held as a ThreadLocal, it just picks up the connection based on the iRODS account if it has already authenticated. The new method above will encompass that behavior.


final OutputStream out1 = new IRODSWriteFeature(session1).write(test1,
                new TransferStatus().append(false).length(content.length));
        final OutputStream out2 = new IRODSWriteFeature(session1).write(test2,
                new TransferStatus().append(false).length(content.length));
        new StreamCopier(new TransferStatus(), new TransferStatus()).transfer(
                new ByteArrayInputStream(content), out2);
        out2.close();
        session2.getClient().getIrodsSession()
        .closeSession(session2.filesystem().getIRODSAccount());
        // Error code received from iRODS:-345000.
        new StreamCopier(new TransferStatus(), new TransferStatus()).transfer(
                new ByteArrayInputStream(content), out1);

So here's where the error occurs. What happens is that the session2.getClient().getIrodsSession().closeSession command closes the underlying agent bar that out1 is still trying to talk to. The file handle on the iRODS side goes away when that agent goes away, so you are trying to now pass an invalid file handle to iRODS when you are doing output stream work.

If you comment out the session2.close() code it works fine. In reality, there's only a need for one session here.

Hope that makes sense! I will push that new method to authenticate without clearing the previous agent if you want to update IRODSSession to do that.

There's zero benefit in a thread to creating multiple IRODSSession objects.

I don't see -23000 errors against iRODS in any of the test cases.

michael-conway commented 7 years ago

PS the bad file descriptor message may be part of this same effect. I did not see it here, but based on when a connection is closed it might be a possible result.

dkocher commented 7 years ago

Hi Michael,

Thanks for your great input on this! Trying to draw conclusions from your observations.

Is it correct to say that we see these intermittent failures, because we have both a thread pool and a pool of IRODSSession instances (and their reference to the AuthResponse). The same session might be assigned to a different thread from the thread pool during its lifecycle. Is it correct that this causes us trouble, because the Jargon API makes the assumption that a single AuthResponse is always accessed from the same thread?

Do you say using authenticateIRODSAccountUtilizingCachedConnectionIfPresent will work even when we login from Thread-1 and then make method calls on IRODSFileSystemAO from Thread-2? Or should we always access a new reference for IRODSFileSystemAO from the current thread?

Note that with this use case, a session might also be closed on a different thread than it was instantiated (for example when the session pool decides to lower the number of open connections and evicts idle sessions).

dkocher commented 7 years ago

I have pushed changes from your input to https://g.iterate.ch/projects/ITERATE/repos/cyberduck/commits/08de69858301bac22c7620ffa6daa02b6f9ae809

dkocher commented 7 years ago

The -23000 is reproducible with the integration test case. See logging output and stacktrace in https://ci.iterate.ch/view/cyberduck/job/cyberduck-features-core/1227/ch.cyberduck$irods/testReport/ch.cyberduck.core.irods/IRODSWriteFeatureTest/testWriteConcurent/

dkocher commented 7 years ago

Test output from running tests with the latest changes in https://g.iterate.ch/projects/ITERATE/repos/cyberduck/commits/08de69858301bac22c7620ffa6daa02b6f9ae809 and 317c95b1a1b4a02546d5986c229c603426678eeb.

2016-12-15 21:34:46,739 [main] DEBUG ch.cyberduck.core.Session - Connection will open to Host{credentials=Credentials{user='iterate'}, hostname='data.iplantcollaborative.org', defaultpath='null', port=1247, protocol=Profile{parent=irods, image=null}}
2016-12-15 21:34:46,741 [main] DEBUG ch.cyberduck.core.irods.IRODSSession - Configure client org.irods.jargon.core.pub.IRODSFileSystem@76508ed1 with properties SettableJargonProperties [useParallelTransfer=true, useNIOForParallelTransfers=false, maxParallelThreads=8, maxFilesAndDirsQueryMax=5000, useTransferThreadsPool=false, transferThreadPoolMaxSimultaneousTransfers=2, transferThreadPoolTimeoutMillis=60000, allowPutGetResourceRedirects=false, computeChecksumAfterTransfer=false, computeAndVerifyChecksumAfterTransfer=true, intraFileStatusCallbacks=false, irodsSocketTimeout=30000, irodsParallelSocketTimeout=30000, internalInputStreamBufferSize=0, internalOutputStreamBufferSize=0, internalCacheBufferSize=-1, sendInputStreamBufferSize=0, localFileOutputStreamBufferSize=0, localFileInputStreamBufferSize=0, putBufferSize=32768, getBufferSize=32768, inputToOutputCopyBufferByteSize=1048576, encoding=UTF-8, instrument=false, reconnect=false, defaultToPublicIfNothingUnderRootWhenListing=true, reconnectTimeInMillis=600000, usingDiscoveredServerPropertiesCache=true, usingSpecificQueryForCollectionListingsWithPermissions=true, usingSpecQueryForDataObjPermissionsForUserInGroup=true, pamTimeToLive=0, forcePamFlush=false, connectionFactory=tcp, checksumEncoding=DEFAULT, parallelTcpKeepAlive=true, parallelTcpSendWindowSize=16384, parallelTcpReceiveWindowSize=16384, parallelTcpPerformancePrefsConnectionTime=0, parallelTcpPerformancePrefsLatency=0, parallelTcpPerformancePrefsBandwidth=1, primaryTcpKeepAlive=true, primaryTcpSendWindowSize=0, primaryTcpReceiveWindowSize=0, primaryTcpPerformancePrefsConnectionTime=1, primaryTcpPerformancePrefsLatency=0, primaryTcpPerformancePrefsBandwidth=1, socketRenewalIntervalInSeconds=420, longTransferRestart=false, parallelCopyBufferSize=4194304, intraFileStatusCallbacksNumberCallsInterval=5, intraFileStatusCallbacksTotalBytesInterval=4194304, negotiationPolicy=NO_NEGOTIATION, encryptionAlgorithmEnum=AES_256_CBC, encryptionKeySize=32, encryptionSaltSize=8, encryptionNumberHashRounds=16]
2016-12-15 21:34:46,741 [main] DEBUG ch.cyberduck.core.Session - Connection did open to Host{credentials=Credentials{user='iterate'}, hostname='data.iplantcollaborative.org', defaultpath='null', port=1247, protocol=Profile{parent=irods, image=null}}
2016-12-15 21:34:49,043 [main] DEBUG ch.cyberduck.core.irods.IRODSSession - Connected to StartupResponseData:
   status:0
   relVersion:rods4.1.9
   apiVersion:d
   reconnPort:0
   reconnAddr:
   cookie:400
2016-12-15 21:34:49,043 [main] DEBUG ch.cyberduck.core.Session - Connection will open to Host{credentials=Credentials{user='iterate'}, hostname='data.iplantcollaborative.org', defaultpath='null', port=1247, protocol=Profile{parent=irods, image=null}}
2016-12-15 21:34:49,044 [main] DEBUG ch.cyberduck.core.irods.IRODSSession - Configure client org.irods.jargon.core.pub.IRODSFileSystem@4f209819 with properties SettableJargonProperties [useParallelTransfer=true, useNIOForParallelTransfers=false, maxParallelThreads=8, maxFilesAndDirsQueryMax=5000, useTransferThreadsPool=false, transferThreadPoolMaxSimultaneousTransfers=2, transferThreadPoolTimeoutMillis=60000, allowPutGetResourceRedirects=false, computeChecksumAfterTransfer=false, computeAndVerifyChecksumAfterTransfer=true, intraFileStatusCallbacks=false, irodsSocketTimeout=30000, irodsParallelSocketTimeout=30000, internalInputStreamBufferSize=0, internalOutputStreamBufferSize=0, internalCacheBufferSize=-1, sendInputStreamBufferSize=0, localFileOutputStreamBufferSize=0, localFileInputStreamBufferSize=0, putBufferSize=32768, getBufferSize=32768, inputToOutputCopyBufferByteSize=1048576, encoding=UTF-8, instrument=false, reconnect=false, defaultToPublicIfNothingUnderRootWhenListing=true, reconnectTimeInMillis=600000, usingDiscoveredServerPropertiesCache=true, usingSpecificQueryForCollectionListingsWithPermissions=true, usingSpecQueryForDataObjPermissionsForUserInGroup=true, pamTimeToLive=0, forcePamFlush=false, connectionFactory=tcp, checksumEncoding=DEFAULT, parallelTcpKeepAlive=true, parallelTcpSendWindowSize=16384, parallelTcpReceiveWindowSize=16384, parallelTcpPerformancePrefsConnectionTime=0, parallelTcpPerformancePrefsLatency=0, parallelTcpPerformancePrefsBandwidth=1, primaryTcpKeepAlive=true, primaryTcpSendWindowSize=0, primaryTcpReceiveWindowSize=0, primaryTcpPerformancePrefsConnectionTime=1, primaryTcpPerformancePrefsLatency=0, primaryTcpPerformancePrefsBandwidth=1, socketRenewalIntervalInSeconds=420, longTransferRestart=false, parallelCopyBufferSize=4194304, intraFileStatusCallbacksNumberCallsInterval=5, intraFileStatusCallbacksTotalBytesInterval=4194304, negotiationPolicy=NO_NEGOTIATION, encryptionAlgorithmEnum=AES_256_CBC, encryptionKeySize=32, encryptionSaltSize=8, encryptionNumberHashRounds=16]
2016-12-15 21:34:49,044 [main] DEBUG ch.cyberduck.core.Session - Connection did open to Host{credentials=Credentials{user='iterate'}, hostname='data.iplantcollaborative.org', defaultpath='null', port=1247, protocol=Profile{parent=irods, image=null}}
2016-12-15 21:34:49,045 [main] DEBUG ch.cyberduck.core.irods.IRODSSession - Connected to StartupResponseData:
   status:0
   relVersion:rods4.1.9
   apiVersion:d
   reconnPort:0
   reconnAddr:
   cookie:400
2016-12-15 21:34:53,351 [main] DEBUG ch.cyberduck.core.io.StreamCopier - End of file reached with 68400 bytes read from stream
2016-12-15 21:34:54,351 [main] ERROR org.irods.jargon.core.pub.io.IRODSFileOutputStream - rethrowing JargonException as IO exception for write operation
org.irods.jargon.core.exception.JargonException: error code received from iRODS:-23000
    at org.irods.jargon.core.connection.IRODSErrorScanner.checkSpecificCodesAndThrowIfExceptionLocated(IRODSErrorScanner.java:325)
    at org.irods.jargon.core.connection.IRODSErrorScanner.inspectAndThrowIfNeeded(IRODSErrorScanner.java:123)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.processMessageInfoLessThanZero(AbstractIRODSMidLevelProtocol.java:1229)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:724)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:690)
    at org.irods.jargon.core.connection.IRODSMidLevelProtocol.irodsFunction(IRODSMidLevelProtocol.java:224)
    at org.irods.jargon.core.pub.io.FileIOOperationsAOImpl.write(FileIOOperationsAOImpl.java:93)
    at org.irods.jargon.core.pub.io.IRODSFileOutputStream.write(IRODSFileOutputStream.java:201)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)
    at ch.cyberduck.core.io.StreamCopier.transfer(StreamCopier.java:111)
    at ch.cyberduck.core.irods.IRODSWriteFeatureTest.testWriteConcurent(IRODSWriteFeatureTest.java:89)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
2016-12-15 21:34:55,030 [main] ERROR org.irods.jargon.core.pub.io.IRODSFileOutputStream - JargonException caught in constructor, rethrow as IOException
org.irods.jargon.core.exception.JargonException: error code received from iRODS:-23000
    at org.irods.jargon.core.connection.IRODSErrorScanner.checkSpecificCodesAndThrowIfExceptionLocated(IRODSErrorScanner.java:325)
    at org.irods.jargon.core.connection.IRODSErrorScanner.inspectAndThrowIfNeeded(IRODSErrorScanner.java:123)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.processMessageInfoLessThanZero(AbstractIRODSMidLevelProtocol.java:1229)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:724)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:690)
    at org.irods.jargon.core.connection.IRODSMidLevelProtocol.irodsFunction(IRODSMidLevelProtocol.java:224)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.irodsFunction(AbstractIRODSMidLevelProtocol.java:176)
    at org.irods.jargon.core.pub.IRODSFileSystemAOImpl.fileClose(IRODSFileSystemAOImpl.java:1059)
    at org.irods.jargon.core.pub.io.IRODSFileImpl.close(IRODSFileImpl.java:1353)
    at org.irods.jargon.core.pub.io.IRODSFileOutputStream.close(IRODSFileOutputStream.java:169)
    at org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
    at ch.cyberduck.core.io.DefaultStreamCloser.close(DefaultStreamCloser.java:53)
    at ch.cyberduck.core.io.StreamCopier.transfer(StreamCopier.java:134)
    at ch.cyberduck.core.irods.IRODSWriteFeatureTest.testWriteConcurent(IRODSWriteFeatureTest.java:89)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

ch.cyberduck.core.exception.BackgroundException{detail='JargonException caught in constructor, rethrow as IOException. Error code received from iRODS:-23000.', cause='java.io.IOException: JargonException caught in constructor, rethrow as IOException', message='Connection failed
'}
    at ch.cyberduck.core.AbstractExceptionMappingService.wrap(AbstractExceptionMappingService.java:109)
    at ch.cyberduck.core.DefaultIOExceptionMappingService.map(DefaultIOExceptionMappingService.java:51)
    at ch.cyberduck.core.io.DefaultStreamCloser.close(DefaultStreamCloser.java:59)
    at ch.cyberduck.core.io.StreamCopier.transfer(StreamCopier.java:134)
    at ch.cyberduck.core.irods.IRODSWriteFeatureTest.testWriteConcurent(IRODSWriteFeatureTest.java:89)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.io.IOException: JargonException caught in constructor, rethrow as IOException
    at org.irods.jargon.core.pub.io.IRODSFileOutputStream.close(IRODSFileOutputStream.java:173)
    at org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
    at ch.cyberduck.core.io.DefaultStreamCloser.close(DefaultStreamCloser.java:53)
    ... 30 more
Caused by: org.irods.jargon.core.exception.JargonException: error code received from iRODS:-23000
    at org.irods.jargon.core.connection.IRODSErrorScanner.checkSpecificCodesAndThrowIfExceptionLocated(IRODSErrorScanner.java:325)
    at org.irods.jargon.core.connection.IRODSErrorScanner.inspectAndThrowIfNeeded(IRODSErrorScanner.java:123)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.processMessageInfoLessThanZero(AbstractIRODSMidLevelProtocol.java:1229)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:724)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:690)
    at org.irods.jargon.core.connection.IRODSMidLevelProtocol.irodsFunction(IRODSMidLevelProtocol.java:224)
    at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.irodsFunction(AbstractIRODSMidLevelProtocol.java:176)
    at org.irods.jargon.core.pub.IRODSFileSystemAOImpl.fileClose(IRODSFileSystemAOImpl.java:1059)
    at org.irods.jargon.core.pub.io.IRODSFileImpl.close(IRODSFileImpl.java:1353)
    at org.irods.jargon.core.pub.io.IRODSFileOutputStream.close(IRODSFileOutputStream.java:169)
    ... 32 more
michael-conway commented 7 years ago

I'll try to unpack in more detail, but in a nutshell it's set up like hibernate sessions. when a DAO under the covers needs to talk to irods, it asks the jargon irods session for currentConnection () and it gets or connects on your behalf per the associated irods account. the actual connection is in a thread local cache, so in a different thread it's a different agent. The irodsprotocolmanager interface can be backed by a pool but this may be orthogonal to what cyber duck might have. irods is quite happy to have very shoes sessions, so I'd suggest just opening and closing in the span of one op. I'll look at this in more detail and can prob suggest some things.

On Dec 15, 2016 3:55 PM, "David Kocher" notifications@github.com wrote:

Hi Michael,

Thanks for your great input on this! Trying to draw conclusions from your observations.

Is it correct to say that we see these intermittent failures, because we have both a thread pool and a pool of IRODSSession instances (and their reference to the AuthResponse). The same session might be assigned to a different thread from the thread pool during its lifecycle. Is it correct that this causes us trouble, because the Jargon API makes the assumption that a single AuthResponse is always accessed from the same thread?

Do you say using authenticateIRODSAccountUtilizingCachedConnectionIfPresent will work even when we login from Thread-1 and then make method calls on IRODSFileSystemAO from Thread-2? Or should we always access a new reference for IRODSFileSystemAO from the current thread?

Note that with this use case, a session might also be closed on a different thread than it was instantiated (for example when the session pool decides to lower the number of open connections and evicts idle sessions).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DICE-UNC/jargon/issues/199#issuecomment-267441506, or mute the thread https://github.com/notifications/unsubscribe-auth/ABC-LaEe5JhxZOCXqJMnmuHLhbJapwURks5rIalKgaJpZM4KgGCh .

michael-conway commented 7 years ago

Is it correct to say that we see these intermittent failures, because we have both a thread pool and a pool of IRODSSession instances (and their reference to the AuthResponse). The same session might be assigned to a different thread from the thread pool during its lifecycle. Is it correct that this causes us trouble, because the Jargon API makes the assumption that a single AuthResponse is always accessed from the same thread?

Is it correct to say that we see these intermittent failures, because we have both a thread pool and a pool of IRODSSession instances (and their reference to the AuthResponse).

Perhaps. The -345000 is more due to, in that single thread of the test case, using two of your IRODSSession objects talking to the same underlying connection, simply because close was called in the middle of operations. That particular case would be solved by just doing a try/finally and bracketing the connection handling there.

The pooling can be a tricky question. If you try and pool your IRODSession and share between threads, you're not really pooling a connection, rather, any operation that takes place on a new thread will not see the ThreadLocal session, and will just open another one. That will probably work fine, as long as you are not trying to do a long running operation from multiple threads.

The same session might be assigned to a different thread from the thread pool during its lifecycle. Is it correct that this causes us trouble, because the Jargon API makes the assumption that a single AuthResponse is always accessed from the same thread?

Not in the -345000 case as outlined above. Assigning your IRODSSession (to distinguish from the Jargon class of the same name) to another thread just opens a new connection to a new agent in that new thread). What could cause you trouble would be opening a file, obtaining a file handle on the iRODS side, and then switching threads to a new agent that doesn't have that file handle. Think of the Cyberduck->iRODS session as a stateful animal, more like a JDBC connection than a stateless pipe like an http download. Operations like file open -> write -> close, or multi-page queries all should occur in the span of a thread. Otherwise, iRODS will be confused. The key thing is inside all this AO mess..

/*
     * (non-Javadoc)
     *
     * @see org.irods.jargon.core.pub.IRODSAccessObject#getIRODSProtocol()
     */
    @Override
    public final AbstractIRODSMidLevelProtocol getIRODSProtocol()
            throws JargonException {
        return getIRODSSession().currentConnection(getIRODSAccount());
    }

That's in IRODSGenericAO. What that means is that any time you do stuff against iRODS, it calls currentConnection(IRODSAccount) under the covers. I can share an AO between threads just fine, but currentConnection always refers to a thread local.

Do you say using authenticateIRODSAccountUtilizingCachedConnectionIfPresent will work even when we login from Thread-1 and then make method calls on IRODSFileSystemAO from Thread-2?

Yes, as long as the account is authorized, it will happily take those credentials and open a connection to a different agent on Thread2. When currentConnection(IRODSAccount) is called, it either finds the cached connection or it creates a new one under the covers. What can get you into trouble is

Really, the best, simplest solution here is to just put your operations in a 'try/catch/finally' and call close() in the finally block, and there should be no problem. If the pattern for cyberduck is to, for example, open a file, then switch threads for the write loop and/or close, then it'll be a problem

Or should we always access a new reference for IRODSFileSystemAO from the current thread? Note that with this use case, a session might also be closed on a different thread than it was instantiated (for example when the session pool decides to lower the number of open connections and evicts idle sessions).

Honestly I'd just do ops in a try/catch/finally and not try to pool. If these Session objects are pooled by default by Cyberduck (sounds like it) you should be find to let Cyberduck pool away. Jargon does do pooling and caching, but it's at the Jargon level and very likely not compatible with how Cyberduck wants to work.

Do you say using authenticateIRODSAccountUtilizingCachedConnectionIfPresent will work even when we login from Thread-1 and then make method calls on IRODSFileSystemAO from Thread-2? Or should we always access a new reference for IRODSFileSystemAO from the current thread?

Note that with this use case, a session might also be closed on a different thread than it was instantiated (for example when the session pool decides to lower the number of open connections and evicts idle sessions).

We should be fine as long as threads or connections are not switched in the middle of a stateful sequence. I'd suggest we go simple, and just have a try/catch/finally, and not close anything Jargon-wise in mid stream.

dkocher commented 7 years ago

It looks like all boils down to the bad assumption in the library that state is kept in org.irods.jargon.core.connection.IRODSSession based on the accessing thread. I propose that the thread local cache is removed and connections are not shared between instances. From a client point of view I expect two references to IRODSFileSystemAO to be independent and and not share state when accessed from the same thread.

Pull request #213.

michael-conway commented 7 years ago

That isn't going to work. This already works very well in many scenarios, including stateless multithreaded environments like Web and app servers, and would necessitate a rewrite of the entire library, while removing necessary protection of thread safety for irods agents.

On Dec 18, 2016 11:37 AM, "David Kocher" notifications@github.com wrote:

It looks like all boils down to the bad assumption in the library that state is kept in org.irods.jargon.core.connection.IRODSSession based on the accessing thread. I propose that the thread local cache is removed and connections are not shared between instances. From a client point of view I expect two references to IRODSFileSystemAO to be independent and and not share state when accessed from the same thread.

Pull request #213 https://github.com/DICE-UNC/jargon/pull/213.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DICE-UNC/jargon/issues/199#issuecomment-267831218, or mute the thread https://github.com/notifications/unsubscribe-auth/ABC-LX3xt0KyqejmdhNGAhhEmXH8kf4qks5rJWE5gaJpZM4KgGCh .

dkocher commented 7 years ago

I can't follow. The app server worker thread should manage the state and scope of the IRODSSession. The library should not make any assumptions on the lifecycle being bound to the calling thread.

The current implementation is not thread safe. For example the app server might open and close a file handle reference on different threads.

dkocher commented 7 years ago

Added test in https://g.iterate.ch/projects/ITERATE/repos/cyberduck/commits/00b9622ed25631377aa96958ca7f245e7f9197a2

Our integration tests against a build of my proposed change are all green.

michael-conway commented 7 years ago

I'm going to change this issue to cover -345000, we can open a separate issue for -23000 errors and check in with CyVerse.

I appreciate the input, but the API is designed purposely around the stateful nature of the iRODS agent and side effects of multi-threaded access, which was an issue in previous API designs.

iRODS and Jargon are not S3, there are many levels of concern, so mixing the demands of CyberDuck for streaming access can lead to faulty assumptions, the requirements of iRODS are the primary concerns.

I note in Cyberduck a need to rely on streaming i/o. That's understandable as the parallel file transfer algorithm has some drawbacks, these are being addressed in upcoming versions of iRODS and Jargon will follow those new API. If a streaming API is desired, it would be preferable to have CyberDuck use an http/s stream and/or connect to WebDav, there is no need to co-mingle Cyberduck and iRODS architecture.