cavaliergopher / grab

A download manager package for Go
BSD 3-Clause "New" or "Revised" License
1.39k stars 150 forks source link

Cancel running request #73

Closed oliverpool closed 3 years ago

oliverpool commented 4 years ago

A running request is currently not cancelled when the Response.Cancel is called.

This PR adds this feature.

oliverpool commented 4 years ago

(no idea why go1.7 is failing, but do we really care about a 3 year old release?)

maddie commented 4 years ago

Looking forward to this PR.. Currently it seems I can cancel the context but the download won't stop until it finishes/errors out. 😅

oliverpool commented 4 years ago

In the meantime, you can handle the context yourself:

ctx, cancel := context.WithCancel(context.Background())
req = req.WithContext(ctx)
resp := client.Do(req)
// when you call cancel() , the download will be cancelled
// if the context is cancelled before the download starts, it will still be started (with a HEAD request) and cancelled afterwards

This is how I implemented it in my code :-)

maddie commented 4 years ago

Hmm. I've tried this way, but resp.Err() doesn't seem to return context canceled as error immediately, instead, it downloads the whole file and return nil as error (while resp.Err() blocks until it returns nil).

Am I missing something?

EDIT: partial code:

// assume I've created request with cancel context

// call cancel()
cancel()

fmt.Println("Start checking error") // output stays here until file download complete
if err := resp.Err(); err != nil {
    fmt.Printf("Returned error: %w", err) // it should return context.ErrCanceled
    return "", "", 0, err
}
fmt.Println("This should not appear") // and then this appears because err is nil and didn't return
oliverpool commented 4 years ago

Hmm, this should not happen.

Could you open another issue and ping me (@oliverpool) with some more details:

oliverpool commented 3 years ago

@cavaliercoder do you have thoughts on this PR?

cavaliercoder commented 3 years ago

Thank you!

frontierpsycho commented 3 years ago

I am using this library and even after upgrading to latest master, which includes this fix, I have to run Cancel() explicitly on the response struct to free the response's resources, even though I am providing my own context which I cancel, when creating the requests. Please bear with me while I explain the issue.

This is how I create the requests (cleaned up to be concise):

        req, err := grab.NewRequest(absFullPath, fileUrls[nr])
                // error handling 

        request := req.WithContext(ctx)

ctx is created as part of a command that needs to download some files along some other things. The entire thing can be cancelled at any time. I use the same context for all my grab requests, and expect that this cancels all of the downloads if the command is cancelled.

I put all requests in a slice and run them like this:

    respch := client.DoBatch(nrWorkers, requests...)

I detect whether the requests get cancelled by listening to the Done channel as suggested by the context docs, and I clean up the partially downloaded files:

        select {
        case <-ctx.Done():
            err := cleanupFiles()

                // ...

Now the issue is that when I do this on Windows, I get an error that says that a file I'm trying to delete is being used by another process.

No other process than mine is using this file before I'm done with it, for sure. Also, if I run response.Cancel() explicitly (this one) for each file before I delete it, then I no longer get the error, so I'm fairly certain that the process keeping the file open is grab. However, I shouldn't have to call response.Cancel(): that seems to call a cancel function that grab itself creates and attaches to the request's context. However, since I provide the request context, and I cancel that outside grab, I expect the request to be cancelled as well and all its resources to be released.

I was using grab 2.0.0 as a module, so I thought the problem was that I didn't have the latest code, including this MR, but I updated to latest master and it didn't fix the issue. So I suspect there's some issue with the cancellation code in grab, but after looking at both my code and grab's code for a couple of days, I couldn't find anything. Any insight is appreciated!

oliverpool commented 3 years ago

@frontierpsycho the v2 release was in 2018 (https://github.com/cavaliercoder/grab/releases) This PR was merged in 2020.

So you should use the master branch:

go get github.com/cavaliercoder/grab@master

And check in you go.mod/go.sum that the correct version is used (it should mention a date and a commit hash).

frontierpsycho commented 3 years ago

Hey @oliverpool thanks for the response! However, as I wrote above, I did use the latest master. This is the entry in my go.mod file:

github.com/cavaliercoder/grab v1.0.1-0.20201108051000-98a5bfe305ec

I don't know where v1.0.1 comes from, but note that 98a5bfe305ec matches the latest commit on master, and that the date part (20201108051000) matches November 2020, when this PR was merged.

Unless there's some serious shenanigans at play, I am indeed using the latest code, that's why I'm baffled as to why it doesn't work!

oliverpool commented 3 years ago

Alright, maybe the issue is somewhere else.

Can you try to wait for the respch to be closed before cleaning up?

The channel should be closed when all responses are done : https://github.com/cavaliercoder/grab/blob/98a5bfe305ecadf50891ae832b5706436283e651/client.go#L163

And with the latest commit on master, this should happen soon after the context is cancelled (since all request/responses will be cancelled).

So something like:

        select {
        case <-ctx.Done():
            for range respch {} // wait for all requests/responses to be cancelled
            err := cleanupFiles()

                // ...
frontierpsycho commented 3 years ago

Hello! Sorry for not looking at this earlier, I went on vacation.

I tried your suggestion and it did indeed work.

But this is a workaround, right? This should be ideally handled internally by grab: by the time something is received in the Done() channel, requests should have been fully closed/cancelled, including all their resources, as is also supported by this comment:

// The returned Response channel is closed only after all of the given Requests
// have completed, successfully or otherwise.

Or have I misunderstood something?

oliverpool commented 3 years ago

Great to hear that this solution works!


I think this is the expected behavior. ctx.Done() is just a signal to tell everyone to stop working; it does not care if anyone hears it.

by the time something is received in the Done() channel

The caller (you) is actually in control of the context and can cancel it anytime. The context does not even provide a way for the listeners of this channel to acknowledge, that they handled the signal.

// The returned Response channel is closed only after all of the given Requests
// have completed, successfully or otherwise.

This comment is referring to the channel respch (and not to the context).

When you add for range respch {}, you actually wait the the "returned Response channel" to be closed and everything works as expected.

Does it make sense?

frontierpsycho commented 3 years ago

Right, that makes sense then, we'll do that. Thanks for the explanation!

In that case it feels like perhaps handling cancellation can be better documented, though :)