DICE-UNC / jargon

Jargon core libraries
Other
28 stars 31 forks source link

What does error code -345000 indicate went wrong? #240

Open markehammons opened 7 years ago

markehammons commented 7 years ago

I'm writing to an IRODSFileOutputStream and I keep getting error -345000. Is this a result of a data race from me using parallelism? The error is pretty obtuse and I cannot figure out what's going on.

This is the stacktrace:

o.i.j.c.p.i.SessionClosingIRODSFileOutputStream - rethrowing JargonException as IO exception for write operation org.irods.jargon.core.exception.JargonException: error code received from iRODS:-345000 at org.irods.jargon.core.connection.IRODSErrorScanner.checkSpecificCodesAndThrowIfExceptionLocated(IRODSErrorScanner.java:325) at org.irods.jargon.core.connection.IRODSErrorScanner.inspectAndThrowIfNeeded(IRODSErrorScanner.java:123) at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.processMessageInfoLessThanZero(AbstractIRODSMidLevelProtocol.java:1172) at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:663) at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:629) at org.irods.jargon.core.connection.IRODSMidLevelProtocol.irodsFunction(IRODSMidLevelProtocol.java:235) at org.irods.jargon.core.pub.io.FileIOOperationsAOImpl.write(FileIOOperationsAOImpl.java:93) at org.irods.jargon.core.pub.io.IRODSFileOutputStream.write(IRODSFileOutputStream.java:201) at org.irods.jargon.core.pub.io.IRODSFileOutputStream.write(IRODSFileOutputStream.java:218) at akka.stream.impl.io.OutputStreamSubscriber$$anonfun$receive$1.applyOrElse(OutputStreamSubscriber.scala:39)

michael-conway commented 7 years ago

Can you share some code? I suspect it might be something along the lines of this:

https://github.com/DICE-UNC/jargon/issues/199

iRODS agents are stateful, so multi-threaded access to something like a stream is going to get you into trouble, we take great pains to isolate operations to one thread/one agent. If you access an agent with more than one thread bad things will happen. The rule of thumb is to treat iRODS like a ijdbc connection, not an http endpoint. If you have a multithreaded app, each thread should talk to a separate agent.

That being said, I would look for a close happening where a file handle is being lost on the back end.

Is this a mid-tier type app you are writing? A foreground app? I can certainly help resolve that, but I'd look for where you are doing a close, and see if you are closing an iRODS connection and then accessing a stream again.

markehammons commented 7 years ago

This code is the concurrent code. It uses akka streams to push an incoming series of ByteStreams to the file outputStream. IRODS is a singleton class that has filefactory and account session stuff encapsulated within:

Flow[ByteString].toMat(StreamConverters.fromOutputStream{() => 
            IRODS.getFileOutputStream(f)})(Keep.right)

So as you see, I spawn an IRODSFileOutputStream specifically for the threads doing the work, so there shouldn't be any multithreaded accesses to the stream itself. That being said, the auth info, and the filefactory and irods session are all in the IRODS singleton and potentially belong to another thread, causing this issue, so I rewrote this code:

Flow[ByteString].toMat(
        StreamConverters.fromOutputStream{
                () =>
            val irodsProtocolManager = IRODSSimpleProtocolManager instance()
            val irodsAccount = new IRODSAccount("irods.webaddress.com",5531,"XXXXXXXX","YYYYYYYY","/bioemerg/groups/","bioemerg", "inaf-disk-1")

            irodsAccount.setDefaultStorageResource("inaf-disk-1")
            val irodsAOFactory = IRODSAccessObjectFactoryImpl.instance(IRODSSession.instance(irodsProtocolManager))
            val fileFactory = irodsAOFactory getIRODSFileFactory irodsAccount

            fileFactory.instanceIRODSFileOutputStream(f.value)}
        )(Keep.right)

Here, you can see I instantiate the IRODS session along with the fileoutputstream, theoretically inside the thread that will be using the fileoutputstream. Unfortunately, I receive the same error -345000. I have found some workarounds to this problem, but they all involve me collating the data and then pushing it to irods at a later step, and I'd prefer to avoid that if I can push the data out as it comes in instead of wasting cpu and memory resources.

michael-conway commented 7 years ago

Yes, I can see your issue. I think the solution here is to create an extension stream package for this use. We want to make sure to not reopen the door to all the issues that come from multi-threaded access to agents, but for a narrow case like this it should be OK. A lot of this hinges on how streams are being used between the client and iRODS, to ensure that race conditions and other bad behaviors do not occur on the agent side.

markehammons commented 7 years ago

Is there any update on this yet? I am hitting this limitation hard, using jargon with anything multithreaded is extremely painful.

dkocher commented 7 years ago

I previously proposed to change this. See #213.

michael-conway commented 7 years ago

No updates yet, I don't have the time or resources right now to do any iRODS work.

michael-conway commented 7 years ago

No, that's not the issue here. I think this is an issue with configuring how REST API obtain connections, they don't need to be a single thread.

markehammons commented 6 years ago

I solved this limitation a good while ago. My problem was multiple threads accessing certain IRODS objects. I found my best solution was to create a connection per thread when multiple file accesses were needed.

My big issue now is expanding this pattern into a naturally multithreaded library. So far what I have works, but it doesn’t protect from multiple accesses to the same file at the same time. I dunno if jargon gives me the ability to check if something is writing or reading from a file either.

So quick question, pertaining to how irods works. I’ve noticed irods lets me open multiple reading streams at once to a file, but I’m not sure about writing streams. I’d guess that iRods blocks me from having multiple writers to the same file. I’m also assuming that writing operations on a file while something is reading from that file? Is that the case?

Further, does jargon 4.0 have anything that can tell me if something is accessing a file at the moment and what the nature of those accesses are? I’m currently designing a rest api that has file accesses by users tracked and doling out write and read permissions for file operations in FIFO order, but if IRODs is already tracking that info in a way I can access I can save myself a bit of work.

In any case, i'm planning on having rest api for file operations on iRODS developed by the end of the year. I have one currently, but it's more limited in scope.