EMCECS / nfs-client-java

Native Java NFS client
Apache License 2.0
70 stars 33 forks source link

Large files have negative available bytes due to long to int type cas… #21

Closed davydotcom closed 5 years ago

davydotcom commented 6 years ago

…ting..

davydotcom commented 6 years ago

It was observed that NFSFileInputStream was coming back with 0 bytes if file was larger than 2.1GB (basically Integer.MAX_VALUE)

DavidASeibert commented 6 years ago

This is actually an interface problem - for practical Java coding reasons, I implemented the InputStream interface, but that interface doesn't support large streams properly. The proposed fix will make the returned value look more sensible, but it will still be wrong, and it makes it harder than necessary for users to see if the value is correct or not.

The correct solution may be to return -1 if the value is larger than Integer.MaxValue, and then to implement an interface extension that handles these larger streams. I have not yet figured out how to design that extension for ease of use, though, and I do not know of any standard fix that has been developed for that interface. Do you have any suggestions for the complete fix?

davydotcom commented 6 years ago

-1 implies end of stream. The use of this field is being used improperly as it is right now. Available is not meant to show the amount of bytes remaining in the file but rather the amount available to fetch from the stream without blocking (is the proper use case). Since it is being used to imitate this for pulling the chunked bytes it seemed better suited to imply that there are bytes available for now as total file size here is not at all a valid dataset . The interface is not the problem

On Jun 29, 2018, at 12:27 PM, DavidASeibert notifications@github.com wrote:

This is actually an interface problem - for practical Java coding reasons, I implemented the InputStream interface, but that interface doesn't support large streams properly. The proposed fix will make the returned value look more sensible, but it will still be wrong, and it makes it harder than necessary for users to see if the value is correct or not.

The correct solution may be to return -1 if the value is larger than Integer.MaxValue, and then to implement an interface extension that handles these larger streams. I have not yet figured out how to design that extension for ease of use, though, and I do not know of any standard fix that has been developed for that interface. Do you have any suggestions for the complete fix?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EMCECS/nfs-client-java/pull/21#issuecomment-401405845, or mute the thread https://github.com/notifications/unsubscribe-auth/AABaEtCMUdP9ZM7iG-yW4m51Bsfsb-Wvks5uBlWEgaJpZM4U7-yM.

davydotcom commented 6 years ago

The proper answer or definition of the available() method from the javadoc is this...

" Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes. "

So if implementing properly it would estimate this value and never be based on the size of the file itself.

davydotcom commented 6 years ago

I should highlight this is a minimal fix so that it behaves acceptably with other streams for fetching files larger than 2.1GB (in our case its QCOW2 , RAW , or VMDK image files for backup/recovery)

DavidASeibert commented 6 years ago

Yes, my initial thoughts were a bit off. Coming back here from other code...

-1 does not imply EOF for available(), but 0 does according to the javadoc. Probably the best estimate the code can give is the following:

bytesLeftInBuffer() if that is > 0, or 0 if _isEof, or _bytesInBuffer otherwise.

That will give either the remaining bytes that can be read without going to the NFS file, or the best estimate of what the next NFS read will give if the buffer is currently empty and there is still data left to read. That will also give good processing results, as the buffer size is typically set to the optimum NFS read size for that system. That will also ensure that it never returns 0 is there is possibly data to be read. Does that make sense to you?

davydotcom commented 6 years ago

Yes it does. -1 is actually used a lot on HTTP InputStreams in their docs to imply EOF because 0 doesnt mean the stream is at end of file. It means the transfer might be hung and theres nothing to be read at this moment but there may be shortly.

On Jun 29, 2018, at 1:00 PM, DavidASeibert notifications@github.com wrote:

Yes, my initial thoughts were a bit off. Coming back here from other code...

-1 does not imply EOF for available(), but 0 does according to the javadoc. Probably the best estimate the code can give is the following:

bytesLeftInBuffer() if that is > 0, or 0 if _isEof, or _bytesInBuffer otherwise.

That will give either the remaining bytes that can be read without going to the NFS file, or the best estimate of what the next NFS read will give if the buffer is currently empty and there is still data left to read. That will also give good processing results, as the buffer size is typically set to the optimum NFS read size for that system. Does that make sense to you?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EMCECS/nfs-client-java/pull/21#issuecomment-401414576, or mute the thread https://github.com/notifications/unsubscribe-auth/AABaEi5pwtpJu52VDJKIILHbKp_uv1ltks5uBl1IgaJpZM4U7-yM.

DavidASeibert commented 6 years ago

The javadoc says that 0 means you are at the end of the stream. Yes, there may be extensions for subclasses that have special values for -1, but that doesn't really make sense for anything other than special devices, and we're trying to stay as plain vanilla as possible.

https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()

Ignoring -1 as a special value, does that logic make sense?

davydotcom commented 6 years ago

Im reading the java doc and see nothing of the sort. As a matter of fact on the base implementation "The available method for class InputStream always returns 0. “

It says it will return an ESTIMATE of the bytes available for reading or it will return 0 when it reaches the end of the stream. 0 could also be an estimate of bytes available for being read without blocking

On Jun 29, 2018, at 1:08 PM, DavidASeibert notifications@github.com wrote:

The javadoc says that 0 means you are at the end of the stream. Yes, there may be extensions for subclasses that have special values for -1, but that doesn't really make sense for anything other than special devices, and we're trying to stay as plain vanilla as possible.

https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available() https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available() — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EMCECS/nfs-client-java/pull/21#issuecomment-401416514, or mute the thread https://github.com/notifications/unsubscribe-auth/AABaEgrLjPesBz7LQhxIWB8HG6EGA-8Lks5uBl8fgaJpZM4U7-yM.

twincitiesguy commented 2 years ago

I'm leaving this comment here, even though this PR has already been merged, because we are trying to get a release out that includes a debug fix (#42). Seems to me, based on the method contract, available() should just return bytesLeftInBuffer(). The intent is to reveal how many bytes can be read without any (blocking) network IO, and I think bytesLeftInBuffer() should provide this accurately, unless I'm missing something. FYI, we may introduce this change prior to the next release.