cavaliergopher / grab

A download manager package for Go
BSD 3-Clause "New" or "Revised" License
1.38k stars 151 forks source link

Corrupted contents when downloading on top of wrong file #21

Closed glyn closed 6 years ago

glyn commented 6 years ago

Grab looked like a good fit for a project I'm working on so I gave it a spin. I found that it downloaded a file perfectly and when asked to download the same file again managed to avoid downloading all the bytes again, which was just what I was looking for.

I then overwrote the downloaded file with completely different contents and then downloaded again using grab. The message:

  206 Partial Content

was emitted and the download was apparently successful. The downloaded file even had the same number of bytes as the original, but unfortunately the contents were corrupted.

Fortunately, this problem is easily reproduced using the example program in the README:

$ go run main.go
Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
  200 OK
Download saved to ./gobook.pdf
$ mv gobook.pdf gobook.pdf.good
$ cp main.go gobook.pdf
$ go run main.go
Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
  206 Partial Content
Download saved to ./gobook.pdf
$ diff gobook.pdf gobook.pdf.good
Binary files gobook.pdf and gobook.pdf.good differ
$ ls -l
total 11320
-rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf
-rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf.good
-rw-r--r--  1 gnormington  staff     1139  3 Nov 11:10 main.go

The environment is go version go1.9.2 darwin/amd64 on macOS 10.13.1.

In case the README changes, the contents of main.go above is:

package main

import (
    "fmt"
    "os"
    "time"

    "github.com/cavaliercoder/grab"
)

func main() {
    // create client
    client := grab.NewClient()
    req, _ := grab.NewRequest(".", "http://www.golang-book.com/public/pdf/gobook.pdf")

    // start download
    fmt.Printf("Downloading %v...\n", req.URL())
    resp := client.Do(req)
    fmt.Printf("  %v\n", resp.HTTPResponse.Status)

    // start UI loop
    t := time.NewTicker(500 * time.Millisecond)
    defer t.Stop()

Loop:
    for {
        select {
        case <-t.C:
            fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
                resp.BytesComplete(),
                resp.Size,
                100*resp.Progress())

        case <-resp.Done:
            // download is complete
            break Loop
        }
    }

    // check for errors
    if err := resp.Err(); err != nil {
        fmt.Fprintf(os.Stderr, "Download failed: %v\n", err)
        os.Exit(1)
    }

    fmt.Printf("Download saved to ./%v \n", resp.Filename)

    // Output:
    // Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
    //   200 OK
    //   transferred 42970 / 2893557 bytes (1.49%)
    //   transferred 1207474 / 2893557 bytes (41.73%)
    //   transferred 2758210 / 2893557 bytes (95.32%)
    // Download saved to ./gobook.pdf
}
cavaliercoder commented 6 years ago

This is really interesting problem! Thanks for raising.

Having thought about this at length, the issue might be outside the scope of this package. Grab doesn't offer any correctness/consistency guarantees above those offered by the standard net/http library.

You might consider using the checksum feature. It means you waste time downloading a corrupted file, but it will throw an error.

Alternatively, you could implement some "progress snapshots" in your application. Maybe performing a checksum when you pause a download, storing it, and then checking the partial file again before you resume.

Finally, I'd suggest that if two applications are writing to the same file, causing this kind of corruption, there may be a design flaw.

glyn commented 6 years ago

Thanks for the feedback Ryan. If you have a few minutes, it may be worth documenting the restriction of this issue so that others don't fall into the same trap. They won't necessarily think to look in closed issues.

(We didn't use Grab in the end because of this restriction.)

cavaliercoder commented 6 years ago

Thanks for the feedback. I've added a 'Design trade-offs' section to the README. It's a little verbose, but hopefully makes this situation more clear.

I'm curious to learn how you solved the issue of resuming files when the local copied had been modified outside grab?

cavaliercoder commented 6 years ago

Readme update: 9a33a2e

glyn commented 6 years ago

Thanks for the README update @cavaliercoder!

I'm curious to learn how you solved the issue of resuming files when the local copied had been modified outside grab?

IIRC (and 6 months is a long time!) we check the checksum at the end so we can't end up using an invalid download.