range requests (resume) only work if request.Filename is explicitly set

wryfi commented 6 years ago

I'm using grab for the first time, and I've written a function that takes a directory and a list of URLs, and uses grab to download the specified URLs to the directory, in parallel. Unless I'm doing something wrong (quite possible) range request headers don't seem to be added to my requests.

If I interrupt a download, and then start it again, the existing files do not resume downloading where they left off, regardless of how I set request.NoResume.

With request.NoResume = false, the download is transferred again from the beginning, but the destination file is not truncated. Its size does not change until the downloaded data exceeds the size of the existing file. But all of the file data is downloaded from the server from byte 0, regardless of the size of the existing destination file. I know that it is downloading from 0, because I see the incoming traffic in iftop, and because of what I see in the http headers (see below).

With request.NoResume = true, the behavior is the same, except that the destination file is truncated as soon as the file download begins.

I fired up Wireshark to investigate the requests going over the wire. In response to grab's initial HEAD request for the file, nginx correctly responds with Accept-Ranges: bytes, but the following GET request from grab does not contain a Range header of any kind.

HEAD /images/xxxxx.box HTTP/1.1
Host: image-build-201.xxxxx.org
User-Agent: pogos

HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Wed, 16 May 2018 21:10:46 GMT
Content-Type: application/octet-stream
Content-Length: 1629246392
Last-Modified: Wed, 16 May 2018 18:07:18 GMT
Connection: keep-alive
ETag: "5afc7356-611c53b8"
Accept-Ranges: bytes

GET /images/xxxxx.box HTTP/1.1
Host: image-build-201.xxxxx.org
User-Agent: pogos
Accept-Encoding: gzip

I have only tested this using the batch request mode, so I don't know if this also affects requests made synchronously.

I am using grab v2.0.0 with this code. If I'm "doing it wrong," please let me know.

wryfi commented 6 years ago

Ok, it appears that if I set the request.Filename, then everything works as expected.

cavaliercoder commented 5 years ago

This is a bug. If resume is not supported, the destination file should be truncated. In any case, this should work without you having to set filename. I'll look into this, thank you!

cavaliercoder commented 5 years ago

If your URLs are shareable, please provide these.

cavaliergopher / grab

range requests (resume) only work if request.Filename is explicitly set #35