Ok, quite the rabbit hole here, but I got to explore an odd dark corner of the internet in the process. Here goes: so our strategy for downloading/getting was this:
make an initial Range request of 0-8MB
if object is smaller than 8MB, no problem, it returns 206 anyway and we don't make any more requests
if object is bigger, it tells us total object size in Content-Range header and we continue w/ concurrency
BUT, it turns out if the requested object happens to be empty, i.e. 0-bytes, the Range request fails with a 416 status response, "Range not satisfiable". What??
Indeed, it turns out the RFC states pretty clearly that if the requested object is empty, then a 416 should be returned for Range requests. It does, however, say you can make a Range request like Range: bytes=-1 with any non-zero number and that's ok (again, what??).
Alright, so we can't just dive into doing Range-ed GET requests.
So this PR implements the following strategy:
We make an inital HEAD request on the object
This returns the Content-Length response header that tells us if the requested object is 0 bytes
If so: we short circuit the whole get operation and return the 0-byte body in whatever form provided
Otherwise: we continue forward w/ our Range-ed GET requests
So net-net, for 0-byte objects, this is no overhead since we're still just doing 1 total request. For > 0 byte objects, we're doing 1 extra HEAD request, which for large, multipart downloads shouldn't even be noticeable. For smaller objects, this isn't ideal because we're essentially doubling the # of requests, but I'm not sure we can do any better in this generic setting. For now, if you know your objects are non-zero length, and you don't need multipart downloading, you can pass allowMultipart=false and we won't do the extra HEAD request. That seems good enough for now.
Ok, quite the rabbit hole here, but I got to explore an odd dark corner of the internet in the process. Here goes: so our strategy for downloading/getting was this:
Indeed, it turns out the RFC states pretty clearly that if the requested object is empty, then a 416 should be returned for Range requests. It does, however, say you can make a Range request like
Range: bytes=-1
with any non-zero number and that's ok (again, what??).Alright, so we can't just dive into doing Range-ed GET requests.
So this PR implements the following strategy:
Content-Length
response header that tells us if the requested object is 0 bytesSo net-net, for 0-byte objects, this is no overhead since we're still just doing 1 total request. For > 0 byte objects, we're doing 1 extra HEAD request, which for large, multipart downloads shouldn't even be noticeable. For smaller objects, this isn't ideal because we're essentially doubling the # of requests, but I'm not sure we can do any better in this generic setting. For now, if you know your objects are non-zero length, and you don't need multipart downloading, you can pass
allowMultipart=false
and we won't do the extra HEAD request. That seems good enough for now.