Closed davydotcom closed 6 years ago
It was observed that NFSFileInputStream was coming back with 0 bytes if file was larger than 2.1GB (basically Integer.MAX_VALUE)
This is actually an interface problem - for practical Java coding reasons, I implemented the InputStream
interface, but that interface doesn't support large streams properly. The proposed fix will make the returned value look more sensible, but it will still be wrong, and it makes it harder than necessary for users to see if the value is correct or not.
The correct solution may be to return -1
if the value is larger than Integer.MaxValue
, and then to implement an interface extension that handles these larger streams. I have not yet figured out how to design that extension for ease of use, though, and I do not know of any standard fix that has been developed for that interface. Do you have any suggestions for the complete fix?
-1 implies end of stream. The use of this field is being used improperly as it is right now. Available is not meant to show the amount of bytes remaining in the file but rather the amount available to fetch from the stream without blocking (is the proper use case). Since it is being used to imitate this for pulling the chunked bytes it seemed better suited to imply that there are bytes available for now as total file size here is not at all a valid dataset . The interface is not the problem
On Jun 29, 2018, at 12:27 PM, DavidASeibert notifications@github.com wrote:
This is actually an interface problem - for practical Java coding reasons, I implemented the InputStream interface, but that interface doesn't support large streams properly. The proposed fix will make the returned value look more sensible, but it will still be wrong, and it makes it harder than necessary for users to see if the value is correct or not.
The correct solution may be to return -1 if the value is larger than Integer.MaxValue, and then to implement an interface extension that handles these larger streams. I have not yet figured out how to design that extension for ease of use, though, and I do not know of any standard fix that has been developed for that interface. Do you have any suggestions for the complete fix?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EMCECS/nfs-client-java/pull/21#issuecomment-401405845, or mute the thread https://github.com/notifications/unsubscribe-auth/AABaEtCMUdP9ZM7iG-yW4m51Bsfsb-Wvks5uBlWEgaJpZM4U7-yM.
The proper answer or definition of the available() method from the javadoc is this...
" Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes. "
So if implementing properly it would estimate this value and never be based on the size of the file itself.
I should highlight this is a minimal fix so that it behaves acceptably with other streams for fetching files larger than 2.1GB (in our case its QCOW2 , RAW , or VMDK image files for backup/recovery)
Yes, my initial thoughts were a bit off. Coming back here from other code...
-1
does not imply EOF
for available()
, but 0
does according to the javadoc. Probably the best estimate the code can give is the following:
bytesLeftInBuffer()
if that is > 0, or
0
if _isEof
, or
_bytesInBuffer
otherwise.
That will give either the remaining bytes that can be read without going to the NFS file, or the best estimate of what the next NFS read will give if the buffer is currently empty and there is still data left to read. That will also give good processing results, as the buffer size is typically set to the optimum NFS read size for that system. That will also ensure that it never returns 0
is there is possibly data to be read. Does that make sense to you?
Yes it does. -1 is actually used a lot on HTTP InputStreams in their docs to imply EOF because 0 doesnt mean the stream is at end of file. It means the transfer might be hung and theres nothing to be read at this moment but there may be shortly.
On Jun 29, 2018, at 1:00 PM, DavidASeibert notifications@github.com wrote:
Yes, my initial thoughts were a bit off. Coming back here from other code...
-1 does not imply EOF for available(), but 0 does according to the javadoc. Probably the best estimate the code can give is the following:
bytesLeftInBuffer() if that is > 0, or 0 if _isEof, or _bytesInBuffer otherwise.
That will give either the remaining bytes that can be read without going to the NFS file, or the best estimate of what the next NFS read will give if the buffer is currently empty and there is still data left to read. That will also give good processing results, as the buffer size is typically set to the optimum NFS read size for that system. Does that make sense to you?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EMCECS/nfs-client-java/pull/21#issuecomment-401414576, or mute the thread https://github.com/notifications/unsubscribe-auth/AABaEi5pwtpJu52VDJKIILHbKp_uv1ltks5uBl1IgaJpZM4U7-yM.
The javadoc says that 0
means you are at the end of the stream. Yes, there may be extensions for subclasses that have special values for -1
, but that doesn't really make sense for anything other than special devices, and we're trying to stay as plain vanilla as possible.
https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()
Ignoring -1
as a special value, does that logic make sense?
Im reading the java doc and see nothing of the sort. As a matter of fact on the base implementation "The available method for class InputStream always returns 0. “
It says it will return an ESTIMATE of the bytes available for reading or it will return 0 when it reaches the end of the stream. 0 could also be an estimate of bytes available for being read without blocking
On Jun 29, 2018, at 1:08 PM, DavidASeibert notifications@github.com wrote:
The javadoc says that 0 means you are at the end of the stream. Yes, there may be extensions for subclasses that have special values for -1, but that doesn't really make sense for anything other than special devices, and we're trying to stay as plain vanilla as possible.
https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available() https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available() — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EMCECS/nfs-client-java/pull/21#issuecomment-401416514, or mute the thread https://github.com/notifications/unsubscribe-auth/AABaEgrLjPesBz7LQhxIWB8HG6EGA-8Lks5uBl8fgaJpZM4U7-yM.
I'm leaving this comment here, even though this PR has already been merged, because we are trying to get a release out that includes a debug fix (#42). Seems to me, based on the method contract, available()
should just return bytesLeftInBuffer()
. The intent is to reveal how many bytes can be read without any (blocking) network IO, and I think bytesLeftInBuffer()
should provide this accurately, unless I'm missing something. FYI, we may introduce this change prior to the next release.
…ting..