TritonDataCenter / java-manta

Java Manta Client SDK
Mozilla Public License 2.0
16 stars 26 forks source link

Refactor how streams are handled #40

Closed dekobon closed 8 years ago

phillipross commented 8 years ago

:+1:

dekobon commented 8 years ago

Fixing this will require a refactor of how MantaClient interacts with MantaObject. We will need to provide a means in which you can close the inputstream. My thought is that we remove the methods related to data retrieval and data persistence entirely from the MantaObject. We instead provide those methods as part of MantaClient.

phillipross commented 8 years ago

I think this issue might be titled incorrectly. As of release 1.6.0 the MantaClient builds a MantaObject, assigns the http client's content input stream to the MantaObject dataInputStream property... but does not read from it. It appears that the manta object's data is NOT read eagerly which is what the title of this issue suggests.

It is true that the http client's input stream is not closed by the manta client... and this problematic. It's up to the user of MantaClient to obtain a reference to the dataInputStream property of the MantaObject and invoke the close method on that. If the user doesn't do this... it eventually exhausts the pool of http client connections and results in exceptions such as the following:

com.joyent.manta.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection

Moving the retrieval and persistence logic out of MantaObject and into MantaClient is a better way to go. It may be the case that MantaObject should be more of a lightweight object for just transporting data around rather than something coupled to an http client and handling I/O on behalf of itself.

dekobon commented 8 years ago

Work on this is happening here: https://github.com/joyent/java-manta/tree/stream_rejigger

dekobon commented 8 years ago

@phillipross

I've finished by initial proof of concept in this checkin (a7e79cbf2ddc49b22be2d775bd15b3772d6c010f) in the stream_rejigger branch. Please take a look at it. I bumped the version to 2.0.0-SNAPSHOT because it totally breaks the API.

Would this type of API be useable for you? It helps to resolve the orphaned streams issues that are rampant. It also deals with the hard to understand tiered logic of stream, file or string.

Upon further revisions, I may make MantaObjectMetadata completely immutable.

phillipross commented 8 years ago

@dekobon I looked at a few changes in the branch and it all looked like it's heading in a good direction. I'll checkout the entire branch and take a closer look in my IDE tonight.

Something I have on my wishlist for I/O handling would be externally configurable buffer sizes so that it's possible to run on JPC micro instances. I've run into problems in the past with the manta CLI not working on the micro instances due to aggressive allocation of system memory buffers, and it required code changes to fix it. If the manta java client would ever use system buffers, I'd hope that the buffer sizes could be externally configurable so that the sizes can be changed at runtime rather than needing code changes. :grinning:

phillipross commented 8 years ago

@dekobon

The other comment I had regarding the MantaObjectMetadata and immutability... keep in mind that manta supports up to 4KB of custom metadata that can be used in a manta object. We'll want to make sure that it's easy to work with this metadata. In 1.x the MantaObject keeps a reference to http headers object which would hold the custom metadata in the form of headers beginning with "m-" and working with metadata is fairly unwieldy given that a header is a named list in the google http client api.

dekobon commented 8 years ago

@phillipross With regards to metadata, would you like to sketch up a proposed implementation after we finalize the initial API changes?

As for system memory buffers, I don't think we ever plan to use any buffers outside of the JVM for anything in the Manta client. As for memory inside of the JVM, take a look at all of the magic numbers we have set for buffers and see which ones you would like to be configurable. I expect with the changes we are making we should be able to minimize the memory footprint per object.

Also, if the default Manta CLI has any behavior that aren't happy with, please write it up and I will make sure it is filed as a bug.

phillipross commented 8 years ago

I looked at the latest commits on stream_rejigger and looks like it's coming along nicely. It actually handles the major I/O problems, creating a much more organized API, and looks like a much better foundation.

Currently it looks like all I/O that is handled within the client is done so via Scanner and java.nio.file.Files, and I think Files uses an 8k buffer size which is tiny, but I was just giving a heads up if we'd ever decide to handle buffering in the client code. Buffer tuning as an optimization is probably a little far off at this point.

For custom metadata functionality, I was hoping to just have something as simple as a simple map interface (or multimap if manta supports multiple custom metadata fields with the same key) that custom metadata could be managed with, but the manta service has a cap on custom metadata so it should probably be validated somehow.

dekobon commented 8 years ago

@phillipross FYI: I just added read-only support for NIO SeekableByteChannel streams.

phillipross commented 8 years ago

:+1: