DICE-UNC / jargon

Jargon core libraries
Other
28 stars 31 forks source link

failing parallel file operation should throw exception #133

Closed carsten-jahn closed 6 years ago

carsten-jahn commented 9 years ago

In a related ticket (#132), we noticed log messages like those:

2181 [pool-1-thread-7] ERROR org.irods.jargon.core.transfer.ParallelPutTransferThread  - error writing to iRODS parallel transfer socket
java.net.SocketException: Software caused connection abort: socket write error
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.readWriteLoopForCurrentHeaderDirective(ParallelPutTransferThread.java:341)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.put(ParallelPutTransferThread.java:290)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.call(ParallelPutTransferThread.java:169)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.call(ParallelPutTransferThread.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2185 [pool-1-thread-7] ERROR org.irods.jargon.core.transfer.ParallelPutTransferThread  - An IO exception occurred during a parallel file put operation
org.irods.jargon.core.exception.JargonException: java.net.SocketException: Software caused connection abort: socket write error
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.readWriteLoopForCurrentHeaderDirective(ParallelPutTransferThread.java:417)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.put(ParallelPutTransferThread.java:290)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.call(ParallelPutTransferThread.java:169)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.call(ParallelPutTransferThread.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Software caused connection abort: socket write error
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at org.irods.jargon.core.transfer.ParallelPutTransferThread.readWriteLoopForCurrentHeaderDirective(ParallelPutTransferThread.java:341)
    ... 7 more

However, Jargon does not propagate errors up from the multithreaded parallel file transfer code to the main DataTransferOperations.putOperation method. Although we see these errors in the log file, our call of the putOperation returns as if it was successful.

I think it is important to fail with exceptions whenever a change was not persisted to iRODS (same is true for the icommands of course). Please extend exception handling to cover multithreaded cases in Jargon.

michael-conway commented 7 years ago

Did you specify a callback listener? The way it works is that, if a callback listener is specified, the exception goes to the callback listener, rather than gets rethrown. This was originally done for the old desktop transfer client, and a few other use cases, where the model was to tolerate partial failures, letting the callback listener decide what to do (e.g. a model to set a tolerance level so that the user could later retransmit only failed operations).

So in the code below, I could reconsider that and have a 'fail fast' setting in Jargon properties that will go ahead and just rethrow up in the put operation.

The way this works now is that jargon is deferring to the callback listener to decide how to handle the exception. If you did NOT specify a callback listener, let me know, because that would be a 'bug' not a 'feature'!

I'm happy again to add a 'fail fast' configuration option and set that as the default. I suspect I'll be re-approaching desktop transfer tools and I can just turn off fail fast in that scenario.

private void processExceptionDuringPutOperation(
            final File sourceFile,
            final IRODSFile targetIrodsFile,
            final TransferStatusCallbackListener transferStatusCallbackListener,
            final TransferControlBlock transferControlBlock,
            final JargonException je) throws JargonException {

        log.error("exception in transfer", je);

        transferControlBlock.reportErrorInTransfer();

        if (transferStatusCallbackListener != null) {
            log.warn("exception will be passed back to existing callback listener");

            TransferStatus status = TransferStatus.instanceForException(
                    TransferType.PUT, sourceFile.getAbsolutePath(),
                    targetIrodsFile.getAbsolutePath(), "", sourceFile.length(),

                    sourceFile.length(),
                    transferControlBlock.getTotalFilesTransferredSoFar(),
                    transferControlBlock.getTotalFilesSkippedSoFar(),
                    transferControlBlock.getTotalFilesToTransfer(), je,
                    getIRODSAccount().getHost(), getIRODSAccount().getZone());

            transferStatusCallbackListener.overallStatusCallback(status);

        } else {
            log.warn("exception will be re-thrown, as there is no status callback listener");
            throw je;

        }
michael-conway commented 7 years ago

I'll bump this over into the 4.2 testing while folks may chime in on ths 'fail fast' idea. At the risk of cliche, this behavior is a feature, not a bug, but happy to add a switch in the configuration to immediately fail.